Re: [Discussed] Integrating SPARQL-Gremlin 0.2 Plugin with the TinkerPop codebase

Stephen Mallette Thu, 25 Jan 2018 11:37:38 -0800

Check this out:

gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
gremlin> g = graph.traversal(SparqlTraversalSource.class).
......1>           withStrategies(SparqlStrategy.instance())
==>sparqltraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.sparql("SELECT ?name ?age WHERE { ?person v:name ?name . ?person
v:age ?age }")
==>[name:marko,age:29]
==>[name:vadas,age:27]
==>[name:josh,age:32]
==>[name:peter,age:35]


The work is horribly hacked together at the moment and I've not pushed it
to the development branch yet, but that's the general idea for how
gremlin-sparql will be used based on what we talked about earlier in this
thread. Pretty neat?



On Wed, Jan 24, 2018 at 3:38 PM, Stephen Mallette <[email protected]>
wrote:

> I just wanted to quickly note that sparql-gremlin is now building properly
> on the TINKERPOP-1878 branch (i just pushed some changes to clean up some
> pom.xml/dependency conflicts issues). As we discussed in this thread, the
> branch currently contains a fairly bare bones model and it will need some
> work to get it complete enough for it to be considered for merge to a
> release branch. In a way that's good, because it will give the community a
> chance to shape exactly how sparql-gremlin will work.
>
> On Tue, Jan 9, 2018 at 10:47 AM, Harsh Thakkar <[email protected]> wrote:
>
>> Hi Stephen,
>>
>> It does make sense to me. The work is going on slow but steady. Let's
>> wait and see how other devs feel about this, as you said.
>>
>> Cheers,
>> Harsh
>> On 2018-01-09 16:31, Stephen Mallette <[email protected]> wrote:
>> > I've had some thoughts on this thread since December. Since
>> sparql-gremlin
>> > has a pretty long to-do list and there is likely a lot of discussion
>> > required on this list prior to it being ready for merge to a release
>> > branch, it seems like we might treat this as a normal feature under
>> > development. I think we should just merge it to a development branch in
>> the
>> > TinkerPop repository and then collaborate on it from there. We've taken
>> > similar approaches with other "long term" pull requests which has
>> allowed
>> > the code to develop as it would typically would. I'm thinking that's a
>> > better approach than a "big-bang" pull request.
>> >
>> > Harsh, if that's ok with you, feel free to issue your PR against master
>> and
>> > I'll get it setup against a development branch on our end (no rush,
>> please
>> > give it a few days to see if everyone is ok with that approach).
>> >
>> > On Mon, Dec 18, 2017 at 5:16 PM, Stephen Mallette <[email protected]
>> >
>> > wrote:
>> >
>> > > > Should I also remove the northwind file?
>> > >
>> > > I think I'd prefer to see all of our sparql examples use the existing
>> toy
>> > > graphs - better not to add more options - so I'd remove it as well. If
>> > > anyone disagrees, I don't really feel too strongly about not
>> including it,
>> > > but it would be good to hear some reasoning as to why the existing
>> datasets
>> > > that we already package are insufficient for users to learn with.
>> > >
>> > > >  will need some help (quite possibly) with getting things right as
>> far
>> > > as the DSL pattern for the gremlin language variants is concerned.
>> > >
>> > > We can help point you in the right direction when you get stuck or
>> need to
>> > > clarify things. If you get really stuck, we can move to step 2 and
>> have you
>> > > issue a PR sooner than later and we'll just merge what you have to a
>> > > development branch so others can collaborate with you on it more
>> easily.
>> > > Let's see how things develop.
>> > >
>> > > > Also, since you are very well versed in the test suite, I would also
>> > > request some assistance for the same when we are there :) as it is our
>> > > first time pushing a work to the production level. So bear with us :)
>> > >
>> > > no worries. i will need to think on the testing approach. my thinking
>> will
>> > > be focused on what i would call integration tests i.e. tests that
>> evaluate
>> > > sparql-gremlin across the entire stack. I don't imagine that you need
>> my
>> > > input to write some unit tests to validate the workings of your
>> current
>> > > code though.
>> > >
>> > > > One question, though there is not a strict deadline, when is the
>> 3.3.2
>> > > release planned?
>> > >
>> > > We have no timeline on 3.3.2 at this point (we are just in the
>> process of
>> > > releasing 3.3.1 so it will be a while before we see 3.3.2). I think
>> the
>> > > merging of gremlin-javascript will likely trigger that release, i
>> would
>> > > guess no earlier than February 2018 if all goes right with that. I
>> also
>> > > don't mean to make it sound like sparql-gremlin needs to be part of
>> that
>> > > release, so if it's not ready then, it's not ready and it releases
>> with
>> > > 3.3.3. You'll find that with TinkerPop, we tend to release when
>> software is
>> > > "ready" and not by setting long range time deadlines for ourselves.
>> So,
>> > > don't worry about when we release sparql-gremlin too much. Let's stay
>> > > focused on just getting the code right.
>> > >
>> > > Thanks for your understanding.
>> > >
>> > >
>> > >
>> > >
>> > > On Mon, Dec 18, 2017 at 5:01 PM, Harsh Thakkar <[email protected]>
>> wrote:
>> > >
>> > >> Hello Stephen,
>> > >>
>> > >> Alright, I will remove the bsbm file from the repository and I refer
>> to
>> > >> it in the docs (with some examples) sharing a link to download from
>> the
>> > >> website if that is acceptable. No worries.
>> > >> Should I also remove the northwind file?
>> > >>
>> > >>
>> > >> Your expectations are reasonable, it was just that I wasn't very
>> clear
>> > >> about what needs to be done. Now it is pretty much clear. It will
>> take some
>> > >> time for me to wrap my head around the specifics of the tinkerpop
>> codebase
>> > >> in order to satisfy the 3 requirements. I will need some help (quite
>> > >> possibly) with getting things right as far as the DSL pattern for the
>> > >> gremlin language variants is concerned. I am already reading the
>> dev-docs
>> > >> on this, from here:
>> > >> http://tinkerpop.apache.org/docs/current/reference/#dsl
>> > >>
>> > >> Also, since you are very well versed in the test suite, I would also
>> > >> request some assistance for the same when we are there :) as it is
>> our
>> > >> first time pushing a work to the production level. So bear with us :)
>> > >>
>> > >> I agree with you on not having any API shifts, this does certainly
>> not
>> > >> give a good impression, also its a lot of effort down the drain.
>> Quality
>> > >> must be ensured.
>> > >>
>> > >> One question, though there is not a strict deadline, when is the
>> 3.3.2
>> > >> release planned?
>> > >>
>> > >> Cheers,
>> > >> Harsh
>> > >>
>> > >>
>> > >> On 2017-12-18 20:48, Stephen Mallette <[email protected]> wrote:
>> > >> > A quick note about (4) - Having some sample data for user
>> convenience is
>> > >> > good. Files like that though should not be "resources", but should
>> be
>> > >> added
>> > >> > here:
>> > >> >
>> > >> > https://github.com/harsh9t/tinkerpop/tree/master/data
>> > >> >
>> > >> > Placing those files there will allow them to be included in the
>> the .zip
>> > >> > distribution files we produce for Gremlin Console and Gremlin
>> Server.
>> > >> Now,
>> > >> > that BSBM file is a bit much. It's 90M in size and 22M compressed
>> to
>> > >> zip.
>> > >> > Either way, that's going to push our already large zip
>> distributions
>> > >> bigger
>> > >> > than they should be. I don't think the value of this file is worth
>> the
>> > >> > that. We can definitely make it available as a separate download
>> from
>> > >> the
>> > >> > web site if everyone thinks it's that important and then provide
>> links
>> > >> to
>> > >> > it, but I don't think it should be in the source repository as it
>> is
>> > >> now.
>> > >> >
>> > >> > Aside from (4) I just wanted to make some general points about my
>> > >> > expectations for a sparql-gremlin being part of a TinkerPop release
>> > >> branch.
>> > >> > Apologies if this wasn't clear from when we started. I think we
>> need to
>> > >> see
>> > >> > sparql-gremlin as close to a final form as possible before we look
>> to
>> > >> merge
>> > >> > it. By "final" I mean:
>> > >> >
>> > >> > 1. sparql-gremlin has a full test suite - that means good unit test
>> > >> > coverage at a minimum and integration tests as necessary (and I
>> sense
>> > >> they
>> > >> > will be necessary). I agree with marko, that we also have to
>> consider
>> > >> the
>> > >> > testing pattern carefully, so that we set the stage properly for
>> future
>> > >> > languages.
>> > >> > 2. sparql-gremlin has a clear and easy method of usage that is
>> > >> consistent
>> > >> > with how TinkerPop works - this is crucial prior to merge because
>> > >> TinkerPop
>> > >> > has high profile production usage. once merged sparql-gremlin will
>> > >> > immediately be consumed by users and we will not want to shift
>> that API
>> > >> > once it is available. we will break the code of too many people if
>> we do
>> > >> > that. we need to strive to get this right from the start.
>> > >> > 3. sparql-gremlin has a good body of user documentation.
>> > >> >
>> > >> > I don't think any of this is insurmountable, but it does mean
>> there is a
>> > >> > fair bit of work to do and it won't happen overnight. We held
>> > >> > gremlin-dotnet to the same rigorous level before merging and even
>> > >> > gremlin-javascript all these months later is still not merged for
>> > >> basically
>> > >> > the same reasons, so this is just the process that we tend to go
>> > >> through.
>> > >> > If we follow what we did for the GLVs, we will likely follow this
>> basic
>> > >> > process:
>> > >> >
>> > >> > 1. You get sparql-gremlin "pretty close" to final in your fork
>> > >> > 2. Once we all agree that you are "pretty close", you offer the
>> pull
>> > >> request
>> > >> > 3. We merge it into a TinkerPop branch for further evaluation (this
>> > >> will be
>> > >> > a development branch and not a release branch)
>> > >> > 4. We work together to get the development branch "final"
>> > >> > 5. We issue a pull request from that development branch
>> > >> > 6. The pull request goes through the standard review/vote process
>> and we
>> > >> > merge to a release branch.
>> > >> > 7. sparql-gremlin will likely be part of 3.3.2 release
>> > >> >
>> > >> > I hope that make sense.
>> > >> >
>> > >> >
>> > >> > On Mon, Dec 18, 2017 at 12:26 PM, Marko Rodriguez <
>> [email protected]
>> > >> >
>> > >> > wrote:
>> > >> >
>> > >> > > Actually, my (3) is bad. Given that query() would always return a
>> > >> > > Traversal<S, Map<String,E>>, it would be necessary to have that
>> > >> linearized
>> > >> > > to Traversal<Vertex,Vertex> for the test suite to validate it.
>> That
>> > >> would
>> > >> > > mean making SPARQLTraversal support extended Traversal methods
>> like
>> > >> > > flatMap(), blah, blah… That seems excessive, though convenient.
>> > >> > >
>> > >> > > Hm…… Thoughts?,
>> > >> > > Marko.
>> > >> > >
>> > >> > > http://markorodriguez.com
>> > >> > >
>> > >> > >
>> > >> > >
>> > >> > > > On Dec 18, 2017, at 9:21 AM, Marko Rodriguez <
>> [email protected]>
>> > >> > > wrote:
>> > >> > > >
>> > >> > > > Hello,
>> > >> > > >
>> > >> > > > A couple of items worth considering.
>> > >> > > >
>> > >> > > > Regarding (7), that should be done prior to master/ merge. It
>> is
>> > >> > > necessary to follow the patterns that are established in
>> TinkerPop
>> > >> > > regarding language interoperability. The DSL pattern developed
>> for
>> > >> Gremlin
>> > >> > > language variants seems to be the best pattern for distinct
>> languages
>> > >> as
>> > >> > > well. In essence, if your language is not a fluent language, and
>> > >> instead,
>> > >> > > uses a String, then it should be wrapped as such in a fluent
>> interface
>> > >> > > using all the Strategy, Step, and Traversal methods that makes
>> sense
>> > >> so it
>> > >> > > works within the larger infrastructure of TinkerPop (e.g.
>> testing! —
>> > >> see
>> > >> > > below). What I proposed in my previous email seems the easiest
>> and
>> > >> cleanest
>> > >> > > way to do things.
>> > >> > > >
>> > >> > > > Regarding (3), testing is crucial. Given that this would be
>> > >> TinkerPop’s
>> > >> > > first distinct language, we don’t have a pattern set forth for
>> > >> testing.
>> > >> > > However, this doesn’t mean we can’t improvise on our current
>> model.
>> > >> Off the
>> > >> > > top of my head, perhaps the best way would be to follow the
>> > >> > > ProcessTestSuite and do the SPARQL variants of those. For
>> instance:
>> > >> > > >
>> > >> > > >       https://github.com/apache/tinkerpop/blob/master/gremlin-
>> > >> > > test/src/main/java/org/apache/tinkerpop/gremlin/process/
>> > >> > > traversal/step/map/VertexTest.java#L62 <
>> https://github.com/apache/
>> > >> > > tinkerpop/blob/master/gremlin-test/src/main/java/org/apache/
>> > >> > > tinkerpop/gremlin/process/traversal/step/map/VertexTest.java
>> #L62>
>> > >> > > >
>> > >> > > > The SPARQL test version would be:
>> > >> > > >
>> > >> > > > @Override
>> > >> > > > public Traversal<Vertex, Vertex> get_g_VX1X_out(final Object
>> v1Id) {
>> > >> > > >   return sparql.query(“SELECT ?x WHERE {“ + toURI(v1Id) + “ ?a
>> ?x
>> > >> }”);
>> > >> > > > }
>> > >> > > >
>> > >> > > > In this way, sparql is your SPARQLTraversalSource for each
>> test and
>> > >> > > query() will return a Traversal typed according (query() will
>> have to
>> > >> have
>> > >> > > solid generic support). From there, you would implement each and
>> > >> every test
>> > >> > > that is semantically possible with SPARQL (where SPARQ won’t be
>> able
>> > >> to
>> > >> > > semantically cover all Gremlin tests).
>> > >> > > >
>> > >> > > > Stephen has done a lot of recent work to generalize our test
>> suite
>> > >> out
>> > >> > > of Java so it is in a language agnostic form. I haven’t been
>> > >> following that
>> > >> > > work so I’m not sure what I’m am saying above is exactly as it
>> should
>> > >> be
>> > >> > > done, but it is a start.
>> > >> > > >
>> > >> > > > HTH,
>> > >> > > > Marko.
>> > >> > > >
>> > >> > > > http://markorodriguez.com <http://markorodriguez.com/>
>> > >> > > >
>> > >> > > >
>> > >> > > >
>> > >> > > >> On Dec 18, 2017, at 7:43 AM, Harsh Thakkar <[email protected]
>> > >> <mailto:
>> > >> > > [email protected]>> wrote:
>> > >> > > >>
>> > >> > > >> Hi Stephen and All,
>> > >> > > >>
>> > >> > > >> Thanks for going through the code. I address your questions
>> below
>> > >> (in
>> > >> > > the same order):
>> > >> > > >>
>> > >> > > >> 1. Yes, this file can be removed. It was just to test the
>> traversal
>> > >> > > method.
>> > >> > > >>
>> > >> > > >> 2. Yes, I have commented out the block of tests at this moment
>> > >> since we
>> > >> > > do not need to run tests at mvn clean install time. However, I
>> kept
>> > >> it (in
>> > >> > > commented out form) if there arose a need in future for the
>> same. It
>> > >> can
>> > >> > > surely be removed if you think, it won't be necessary.
>> > >> > > >>
>> > >> > > >> 3. There were two testing units (we continued them from
>> Daniel's
>> > >> > > version), one to check whether the prefixes are being encoded
>> > >> correctly,
>> > >> > > the second one is to test whether the generated traversal is
>> correct
>> > >> (in
>> > >> > > short the compiler is functioning as it should). Since, we
>> extended
>> > >> > > previous work supporting a variety of SPARQL operators, more test
>> > >> cases can
>> > >> > > be added to validate that each of these is functioning as
>> expected.
>> > >> > > However, as I mentioned in point #2. we need not do it
>> explicitly as
>> > >> we
>> > >> > > (Dharmen and I) have already tested them on 3-4 different
>> datasets and
>> > >> > > query-sets. Now, since we did not know if that was going to be
>> > >> formally
>> > >> > > required in the future or not, we left them as it is, just
>> commented
>> > >> it out.
>> > >> > > >>
>> > >> > > >> 4. These resources are the graphml files that we wish to
>> provide
>> > >> the
>> > >> > > users, for (i) loading and querying famous datasets - the Berlin
>> > >> SPARQL
>> > >> > > Benchmark (BSBM)  (famous in the Semantic Web-RDF community) so
>> that
>> > >> they
>> > >> > > do not have to look elsewhere for the same. (ii) Also, it
>> provides a
>> > >> strong
>> > >> > > use-case for demonstrating the applicability of sparql-gremlin
>> > >> (creates
>> > >> > > trust in the SW community users) and (iii) to keep the plug-in
>> pretty
>> > >> much
>> > >> > > self-dependent.
>> > >> > > >>
>> > >> > > >> 5 & 6  YES, damn it. The IDE did this. I will revert these
>> changes.
>> > >> > > It's like when you are not looking, the IDE does things on it
>> own :-/
>> > >> > > apologies!
>> > >> > > >>
>> > >> > > >> 7. Regarding, Marko's thoughts -- Yes, I was waiting for you
>> to
>> > >> reply
>> > >> > > to the thread. I do have some thoughts on this. But first, I was
>> > >> wondering
>> > >> > > if this (what Marko suggested) is supposed to be entirely
>> implemented
>> > >> in
>> > >> > > the current version of sparql-gremlin 0.2, i.e. including the
>> > >> > > withStrategies() and withStrategies() and remote() features, or
>> it is
>> > >> to be
>> > >> > > supported eventually (after the sparql-gremlin 0.2.0) plugin is
>> > >> rolled out.
>> > >> > > Also, I am not entirely sure I got what Marko was exactly
>> suggesting.
>> > >> I
>> > >> > > bring this to light in the in-line style reply to Marko's comment
>> > >> later
>> > >> > > here.
>> > >> > > >>
>> > >> > > >> The current implementation is more of a typical compiler, the
>> > >> users,
>> > >> > > however, can use it by specifying the query file and the dataset
>> > >> against
>> > >> > > which it is to be executed via the command (once in the gremlin
>> > >> shell):
>> > >> > > >>
>> > >> > > >> gremlin> graph = TinkerGraph.open(..)
>> > >> > > >> gremlin> SparqlToGremlinCompiler.conver
>> tToGremlinTraversal(graph,
>> > >> > > "SELECT ?a WHERE {....} ")
>> > >> > > >> ==>{?x:marko, ?y:29}
>> > >> > > >> ==>{?x:josh, ?y:32}
>> > >> > > >>
>> > >> > > >>
>> > >> > > >> i.e. load a graph using pre-defined tinkerpop methods (
>> graph.io <
>> > >> > > http://graph.io/>(IoCore.gryo()).readGraph(graphName),
>> > >> > > TinkerGraph.open(), etc ) , then execute the traversal as above
>> with
>> > >> > > arguments -- (graph, queryString), where queryString = "SPARQL
>> query".
>> > >> > > >>
>> > >> > > >> Now Let me quote Marko's comment and reply in-line to bring
>> more
>> > >> > > clarity:
>> > >> > > >>
>> > >> > > >> 1. There should be a SPARQLTraversalSource which supports one
>> spawn
>> > >> > > method â€” query(String).
>> > >> > > >>      This is already happening inside the code. Therefore, we
>> do
>> > >> not
>> > >> > > need to mention it explicitly. Please correct me if I got it
>> wrong
>> > >> here.
>> > >> > > >>
>> > >> > > >> 2. SPARQLTraversal is spawned and it only supports only the
>> > >> Traversal
>> > >> > > methods â€” next(), toList(), iterate(), etc.
>> > >> > > >>      All traversal methods that are supported, available to a
>> > >> regular
>> > >> > > gremlin traversal, can be used by the sparql-gremlin compiler
>> > >> generated
>> > >> > > traversal as well.
>> > >> > > >>
>> > >> > > >> 3. query(String) adds a ConstantStep(String).
>> > >> > > >>             This is happening internally (as shown in the
>> example
>> > >> > > above), we can also make explicit. i.e. let the user only
>> provide the
>> > >> > > queryString instead of the whole "SparqlToGremlinCompiler.
>> > >> > > convertToGremlinTraversal(graph, "SELECT ?a WHERE {....} ")"
>> command.
>> > >> > > Does this make sense? or am I missing something here.
>> > >> > > >>
>> > >> > > >>
>> > >> > > >> 4. SPARQLTraversalSource has a registered SPARQLStrategy.
>> > >> > > >>      At this moment, we leave it to the default setting for
>> this
>> > >> > > strategy selection.
>> > >> > > >>
>> > >> > > >> 5. SPARQLTraversalSource should also support withStrategies(),
>> > >> > > withoutStrategies(), withRemote(), etc.
>> > >> > > >>      Once the traversal is generated, it can support all
>> strategies
>> > >> > > like any other gremlin traversal. Does this make sense to you?
>> > >> > > >>
>> > >> > > >> In a nutshell,
>> > >> > > >> What is happening is that we are converting the SPARQL
>> queryString
>> > >> into
>> > >> > > a gremlin traversal and leave it upto the tinkerpop compiler to
>> > >> choose what
>> > >> > > is best for it.
>> > >> > > >> We only map a SPARQL query to its corresponding pattern
>> matching
>> > >> > > gremlin traversal (i.e. using with .match() clause). Since, the
>> > >> > > expressibility of SPARQL is less than that of Gremlin (i.e.
>> SPARQL 1.0
>> > >> > > doesn't support/allow  performing looping and traversing
>> operations),
>> > >> we
>> > >> > > can only map what is in the scope of SPARQL language to Gremlin.
>> Once
>> > >> the
>> > >> > > traversal is generated, it is left to the tinkerpop compiler to
>> > >> select and
>> > >> > > execute a wide range of strategies ( various levels of
>> optimizations,
>> > >> et).
>> > >> > > >>
>> > >> > > >> NOTE - Also, Right now the sparql-gremlin compiler returns the
>> > >> > > traversal (string) and not Bytecode. Returning the Bytecode is
>> also
>> > >> > > completely possible, if you want so. We can just perform
>> > >> > > traversal.asAdmin().getBytecode() for this and it is done.
>> > >> > > >>
>> > >> > > >> Since, we extended Daniel's work, we have not changed the
>> names of
>> > >> > > classes, methods and variable which were used. This, however,
>> can be
>> > >> > > changed, if you suggest so.
>> > >> > > >>
>> > >> > > >> 8. Yes, working in the academia doesn't groom you much on the
>> > >> > > importance of commenting in the code by default, or for that
>> matter
>> > >> any
>> > >> > > "good-practices". I will add appropriate comment block in each
>> class
>> > >> for
>> > >> > > the javadocs.
>> > >> > > >>
>> > >> > > >> I hope the above reply address your questions to quite some
>> extent.
>> > >> > > Most of the issues are already handled internally (as I stated
>> > >> above). We
>> > >> > > can also leave some advanced features such as remote(), for the
>> 0.2.1
>> > >> > > release (though this is just an option) :D
>> > >> > > >> Having said that, Of course, we are in no hurry for the pull
>> > >> request. I
>> > >> > > also believe it makes complete sense to get things right.
>> > >> > > >>
>> > >> > > >> Cheers!
>> > >> > > >>
>> > >> > > >> On 2017-12-18 14:11, Stephen Mallette <[email protected]
>> > >> <mailto:
>> > >> > > [email protected]>> wrote:
>> > >> > > >>> Harsh, I looked at the code in a bit more detail than I have.
>> > >> Here's
>> > >> > > some
>> > >> > > >>> thoughts/questions I had as I was going through things:
>> > >> > > >>>
>> > >> > > >>> 1. Can this file be removed - it doesn't appear to have any
>> usage
>> > >> that
>> > >> > > I
>> > >> > > >>> can see:
>> > >> > > >>>
>> > >> > > >>> https://github.com/harsh9t/tinkerpop/blob/master/sparql-
>> > >> > > gremlin/src/main/java/org/apache/tinkerpop/gremlin/sparql/
>> Runable.java
>> > >> <
>> > >> > > https://github.com/harsh9t/tinkerpop/blob/master/sparql-
>> > >> > > gremlin/src/main/java/org/apache/tinkerpop/gremlin/sparql/
>> > >> Runable.java>
>> > >> > > >>>
>> > >> > > >>> 2. I note that this entire block of tests is commented out -
>> > >> should
>> > >> > > that be
>> > >> > > >>> removed?:
>> > >> > > >>>
>> > >> > > >>> https://github.com/harsh9t/tinkerpop/blob/master/sparql-
>> > >> > > gremlin/src/test/java/org/apache/tinkerpop/gremlin/sparql/
>> > >> > > SparqlToGremlinCompilerTest.java
>> > >> > > >>>
>> > >> > > >>> 3. I could be wrong, but even if you didn't remove the tests
>> > >> above, it
>> > >> > > >>> seems like unit testing is rather thin at this point. Am I
>> missing
>> > >> > > >>> something? Is there more work to do there?
>> > >> > > >>>
>> > >> > > >>> 4. I don't understand the nature of these resources:
>> > >> > > >>>
>> > >> > > >>> https://github.com/harsh9t/tinkerpop/tree/master/sparql-
>> > >> > > gremlin/src/main/resources
>> > >> > > >>>
>> > >> > > >>> Is there any need to package those with the jar? Should
>> those be
>> > >> "test"
>> > >> > > >>> resources instead? Do we need the really large
>> data/bsbm1m.graphml
>> > >> > > file for
>> > >> > > >>> any specific reason?
>> > >> > > >>>
>> > >> > > >>> 5. What are these changes to these poms?
>> > >> > > >>>
>> > >> > > >>> https://github.com/harsh9t/tinkerpop/commit/
>> > >> > > cb3b6512ea3536f556108e5a257c4586aa4d157a
>> > >> > > >>>
>> > >> > > >>> I assume that your IDE did that accidentally and it was not
>> > >> intended.
>> > >> > > >>> Please revert that change.
>> > >> > > >>>
>> > >> > > >>> 6. This looks odd too - gremlin-shaded repeated again and
>> again
>> > >> and
>> > >> > > again:
>> > >> > > >>>
>> > >> > > >>> https://github.com/harsh9t/tinkerpop/commit/
>> > >> > > 143d16f20dcaa9c915b96cdd4adf7b1504db5d36#diff-
>> > >> > > 9e90009f097eabeb25c28159571fc6a2R118
>> > >> > > >>>
>> > >> > > >>> 7. Did you have any thoughts in reference to Marko's earlier
>> > >> reply that
>> > >> > > >>> described how sparql-gremlin should be used? Right now, it
>> seems
>> > >> like
>> > >> > > the
>> > >> > > >>> code you have there is just the "engine" but lacks the piece
>> that
>> > >> > > connects
>> > >> > > >>> it into the rest of the stack. From my perspective, I think
>> we
>> > >> need to
>> > >> > > be
>> > >> > > >>> sure that users have an easy, clear and consistent way to use
>> > >> > > >>> sparql-gremlin before we can merge this work. Obviously,
>> having
>> > >> that
>> > >> > > aspect
>> > >> > > >>> of the code thought through will impact the documentation
>> that you
>> > >> > > write as
>> > >> > > >>> well, so I think you need to go down this path a bit further
>> > >> before we
>> > >> > > get
>> > >> > > >>> to the pull request stage.
>> > >> > > >>>
>> > >> > > >>> 8. We aren't big javadoc sticklers here, but we try to at
>> least
>> > >> get
>> > >> > > class
>> > >> > > >>> level javadoc in place for most classes. I don't see much
>> javadoc
>> > >> or
>> > >> > > >>> comments in the code right now. I think I'd like to see a
>> modicum
>> > >> of
>> > >> > > >>> javadoc/comments present as part of this work.
>> > >> > > >>>
>> > >> > > >>> So, that's my broad level feedback at this point. It seems as
>> > >> though
>> > >> > > there
>> > >> > > >>> are some reasonably large issues there to contend with
>> before a
>> > >> pull
>> > >> > > >>> request is worth issuing. That's not a problem, of
>> course....we
>> > >> will
>> > >> > > just
>> > >> > > >>> keep iterating toward the goal. I'm not aware of anything
>> that is
>> > >> > > pushing
>> > >> > > >>> us to rush to a pull request - I'm of the opinion that we can
>> > >> take the
>> > >> > > time
>> > >> > > >>> to get this right.
>> > >> > > >>>
>> > >> > > >>> Thanks,
>> > >> > > >>>
>> > >> > > >>> Stephen
>> > >> > > >>>
>> > >> > > >>>
>> > >> > > >>> On Fri, Dec 15, 2017 at 1:46 PM, Joshua Shinavier <
>> > >> [email protected]>
>> > >> > > wrote:
>> > >> > > >>>
>> > >> > > >>>> Hi Marko,
>> > >> > > >>>>
>> > >> > > >>>> I think we're more or less on the same page here; it's clear
>> > >> that TP3
>> > >> > > has a
>> > >> > > >>>> different API than TP2. If you look at the guts of TP3
>> GraphSail
>> > >> [1],
>> > >> > > it
>> > >> > > >>>> uses the modern APIs, and yet does adapt them to the Sail
>> > >> interface.
>> > >> > > >>>>
>> > >> > > >>>> Something like PropertyGraphSail (or an equivalent Jena
>> thing)
>> > >> still
>> > >> > > makes
>> > >> > > >>>> sense in TP3, as well. One interesting detail here is that
>> in
>> > >> TP3,
>> > >> > > vertices
>> > >> > > >>>> can have labels, which can be turned into rdf:type
>> statements
>> > >> (that,
>> > >> > > in
>> > >> > > >>>> turn, can be used to enable subclass/superclass inheritance
>> if
>> > >> the
>> > >> > > graph is
>> > >> > > >>>> combined with a RDF schema.
>> > >> > > >>>>
>> > >> > > >>>> A TP3 equivalent of SailGraph would indeed be quite
>> different in
>> > >> > > >>>> implementation -- strategies, not wrapper graph -- than
>> what we
>> > >> had
>> > >> > > for
>> > >> > > >>>> Blueprints, and yet would serve the same purpose.
>> > >> > > >>>>
>> > >> > > >>>> Josh
>> > >> > > >>>>
>> > >> > > >>>>
>> > >> > > >>>> [1]
>> > >> > > >>>> https://github.com/joshsh/graphsail/tree/master/src/
>> > >> > > >>>> main/java/net/fortytwo/tpop/sail
>> > >> > > >>>>
>> > >> > > >>>>
>> > >> > > >>>>
>> > >> > > >>>> On Fri, Dec 15, 2017 at 10:22 AM, Marko Rodriguez <
>> > >> > > [email protected]>
>> > >> > > >>>> wrote:
>> > >> > > >>>>
>> > >> > > >>>>> Hello,
>> > >> > > >>>>>
>> > >> > > >>>>> The model proposed below is in-line with TinkerPop2â€™s
>> way of
>> > >> > > thinking.
>> > >> > > >>>>> Unfortunately, TinkerPop3 and more so for TinkerPop4, the
>> Graph
>> > >> > > >>>> â€œstructure"
>> > >> > > >>>>> API will become deprecated. This means that the notion of
>> > >> > > â€œwrapping the
>> > >> > > >>>>> Graph APIâ€  has gone away for TP3 and will be completely
>> gone
>> > >> in
>> > >> > > TP4. In
>> > >> > > >>>>> TP4, there will not even be a Graph API â€” no more Vertex,
>> > >> Edge,
>> > >> > > Property,
>> > >> > > >>>>> etc. Only the concept of a Graph with only methods like
>> > >> > > >>>> Graph.traversal(),
>> > >> > > >>>>> Graph.partitions(), etc.
>> > >> > > >>>>>
>> > >> > > >>>>> Why was this route taken? In TinkerPop3, there was a need
>> to
>> > >> support
>> > >> > > any
>> > >> > > >>>>> language besides Java. This was why Gremlin bytecode and
>> the
>> > >> concept
>> > >> > > of
>> > >> > > >>>> the
>> > >> > > >>>>> Gremlin traversal machine was introduced. A provider
>> simply gets
>> > >> > > Gremlin
>> > >> > > >>>>> bytecode and has to do something with it. For the
>> Java-based
>> > >> Gremlin
>> > >> > > >>>>> traversal machine, this is why providers implement their
>> own
>> > >> > > GraphStep,
>> > >> > > >>>>> VertexStep, etc. For a Python-based Gremlin traversal
>> machine,
>> > >> > > likewiseâ€¦
>> > >> > > >>>>>
>> > >> > > >>>>> This means that SailGraph, GraphSail, PropertyGraphSail as
>> > >> stated
>> > >> > > below
>> > >> > > >>>>> donâ€™t make sense in the current and future architectures.
>> > >> > > >>>>>
>> > >> > > >>>>> The next question becomes, "well how would you turn an RDF
>> store
>> > >> > > into a
>> > >> > > >>>>> PropertyGraph?â€  Easy â€” implement your own custom
>> GraphStep,
>> > >> > > VertexStep,
>> > >> > > >>>>> etc. and respective ProviderStrategies that will handle the
>> > >> bytecode
>> > >> > > >>>>> compilation accordingly.
>> > >> > > >>>>>
>> > >> > > >>>>> The next question becomes, â€œwell how would a
>> PropertyGraph
>> > >> support
>> > >> > > >>>>> reasoning?â€  Easy â€” implement your own custom
>> > >> DecorationStrategy
>> > >> > > that will
>> > >> > > >>>>> insert reasoning into the traversal giving the RDFS
>> schema. For
>> > >> > > instance:
>> > >> > > >>>>>        g.V().out(â€œlikesâ€ )
>> > >> > > >>>>>                ==>
>> > >> > > >>>>>        g.V().out(â€œknowsâ€ ,â€ likesâ€ )
>> > >> > > >>>>>                iff â€œlikesâ€  is a sub-property of
>> â€œknowsâ€
>> > >> > > >>>>>
>> > >> > > >>>>> In essence, it is possible to do this integration of RDF
>> and
>> > >> > > TinkerPop,
>> > >> > > >>>> it
>> > >> > > >>>>> just needs to be done at the correct level of abstraction
>> so
>> > >> that it
>> > >> > > >>>> stays
>> > >> > > >>>>> in line with how TinkerPop is evolving, not how it was
>> back in
>> > >> 2012.
>> > >> > > >>>>>
>> > >> > > >>>>> Take care,
>> > >> > > >>>>> Marko.
>> > >> > > >>>>>
>> > >> > > >>>>> http://markorodriguez <http://markorodriguez/>.com
>> > >> > > >>>>>
>> > >> > > >>>>>
>> > >> > > >>>>> On 2017-12-13 07:46, Joshua Shinavier <[email protected]>
>> > >> wrote:
>> > >> > > >>>>>> Hi Harsh,>
>> > >> > > >>>>>>
>> > >> > > >>>>>> Glad you are taking Daniel's work forward. In porting the
>> code
>> > >> to
>> > >> > > the>
>> > >> > > >>>>>> TinkerPop code base, might I suggest we allow for not only
>> > >> > > >>>>> SPARQL-Gremlin,>
>> > >> > > >>>>>> but a whole suite of RDF tools as in TP2. Perhaps call the
>> > >> module>
>> > >> > > >>>>>> rdf-gremlin. Then we could have all of:>
>> > >> > > >>>>>>
>> > >> > > >>>>>> * SPARQL-Gremlin: executes standard SPARQL queries over a
>> > >> Property
>> > >> > > >>>> Graph>
>> > >> > > >>>>>> database>
>> > >> > > >>>>>> * GraphSail [1,2]: stores RDF quads in the database,
>> > >> explicitly,
>> > >> > > and>
>> > >> > > >>>>>> enables SPARQL and triple pattern queries over the quads>
>> > >> > > >>>>>> * PropertyGraphSail [3]: exposes a Property Graph with of
>> two
>> > >> > > mappings
>> > >> > > >>>>> to>
>> > >> > > >>>>>> the RDF data model>
>> > >> > > >>>>>> * SailGraph [4]: takes an RDF triple store (not natively
>> > >> supporting>
>> > >> > > >>>>>> Gremlin) and enables Gremlin queries>
>> > >> > > >>>>>> * others? I have often thought that a continuous SPARQL
>> > >> > > implementation>
>> > >> > > >>>>>> built on Gremlin would be powerful>
>> > >> > > >>>>>>
>> > >> > > >>>>>> The biggest mismatch between the TP2 suite and what might
>> be
>> > >> built
>> > >> > > for>
>> > >> > > >>>>>> Apache TinkerPop is that the previous suite was
>> implemented
>> > >> using
>> > >> > > >>>>> (Eclipse)>
>> > >> > > >>>>>> RDF4j, whereas things seem to be leaning towards (Apache)
>> Jena
>> > >> now.>
>> > >> > > >>>>>> However, the same principles could be applied.>
>> > >> > > >>>>>>
>> > >> > > >>>>>> Josh>
>> > >> > > >>>>>>
>> > >> > > >>>>>>
>> > >> > > >>>>>> [1] https://github.com/tinkerpop/blueprints/wiki/Sail-
>> > >> > > Ouplementation>
>> > >> > > >>>>>> [2] https://github.com/joshsh/graphsail>
>> > >> > > >>>>>> [3]>
>> > >> > > >>>>>> https://github.com/tinkerpop/b
>> lueprints/wiki/PropertyGraphSa
>> > >> il-
>> > >> > > >>>>> Ouplementation>
>> > >> > > >>>>>> [4] https://github.com/tinkerpop/blueprints/wiki/Sail-
>> > >> > > Implementation
>> > >> > > >>>>>
>> > >> > > >>>>> http://markorodriguez.com
>> > >> > > >
>> > >> > >
>> > >> > >
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>

Re: [Discussed] Integrating SPARQL-Gremlin 0.2 Plugin with the TinkerPop codebase

Reply via email to