Harsh, I looked at the code in a bit more detail than I have. Here's some thoughts/questions I had as I was going through things:
1. Can this file be removed - it doesn't appear to have any usage that I can see: https://github.com/harsh9t/tinkerpop/blob/master/sparql-gremlin/src/main/java/org/apache/tinkerpop/gremlin/sparql/Runable.java 2. I note that this entire block of tests is commented out - should that be removed?: https://github.com/harsh9t/tinkerpop/blob/master/sparql-gremlin/src/test/java/org/apache/tinkerpop/gremlin/sparql/SparqlToGremlinCompilerTest.java 3. I could be wrong, but even if you didn't remove the tests above, it seems like unit testing is rather thin at this point. Am I missing something? Is there more work to do there? 4. I don't understand the nature of these resources: https://github.com/harsh9t/tinkerpop/tree/master/sparql-gremlin/src/main/resources Is there any need to package those with the jar? Should those be "test" resources instead? Do we need the really large data/bsbm1m.graphml file for any specific reason? 5. What are these changes to these poms? https://github.com/harsh9t/tinkerpop/commit/cb3b6512ea3536f556108e5a257c4586aa4d157a I assume that your IDE did that accidentally and it was not intended. Please revert that change. 6. This looks odd too - gremlin-shaded repeated again and again and again: https://github.com/harsh9t/tinkerpop/commit/143d16f20dcaa9c915b96cdd4adf7b1504db5d36#diff-9e90009f097eabeb25c28159571fc6a2R118 7. Did you have any thoughts in reference to Marko's earlier reply that described how sparql-gremlin should be used? Right now, it seems like the code you have there is just the "engine" but lacks the piece that connects it into the rest of the stack. From my perspective, I think we need to be sure that users have an easy, clear and consistent way to use sparql-gremlin before we can merge this work. Obviously, having that aspect of the code thought through will impact the documentation that you write as well, so I think you need to go down this path a bit further before we get to the pull request stage. 8. We aren't big javadoc sticklers here, but we try to at least get class level javadoc in place for most classes. I don't see much javadoc or comments in the code right now. I think I'd like to see a modicum of javadoc/comments present as part of this work. So, that's my broad level feedback at this point. It seems as though there are some reasonably large issues there to contend with before a pull request is worth issuing. That's not a problem, of course....we will just keep iterating toward the goal. I'm not aware of anything that is pushing us to rush to a pull request - I'm of the opinion that we can take the time to get this right. Thanks, Stephen On Fri, Dec 15, 2017 at 1:46 PM, Joshua Shinavier <[email protected]> wrote: > Hi Marko, > > I think we're more or less on the same page here; it's clear that TP3 has a > different API than TP2. If you look at the guts of TP3 GraphSail [1], it > uses the modern APIs, and yet does adapt them to the Sail interface. > > Something like PropertyGraphSail (or an equivalent Jena thing) still makes > sense in TP3, as well. One interesting detail here is that in TP3, vertices > can have labels, which can be turned into rdf:type statements (that, in > turn, can be used to enable subclass/superclass inheritance if the graph is > combined with a RDF schema. > > A TP3 equivalent of SailGraph would indeed be quite different in > implementation -- strategies, not wrapper graph -- than what we had for > Blueprints, and yet would serve the same purpose. > > Josh > > > [1] > https://github.com/joshsh/graphsail/tree/master/src/ > main/java/net/fortytwo/tpop/sail > > > > On Fri, Dec 15, 2017 at 10:22 AM, Marko Rodriguez <[email protected]> > wrote: > > > Hello, > > > > The model proposed below is in-line with TinkerPop2’s way of thinking. > > Unfortunately, TinkerPop3 and more so for TinkerPop4, the Graph > “structure" > > API will become deprecated. This means that the notion of “wrapping the > > Graph API” has gone away for TP3 and will be completely gone in TP4. In > > TP4, there will not even be a Graph API — no more Vertex, Edge, Property, > > etc. Only the concept of a Graph with only methods like > Graph.traversal(), > > Graph.partitions(), etc. > > > > Why was this route taken? In TinkerPop3, there was a need to support any > > language besides Java. This was why Gremlin bytecode and the concept of > the > > Gremlin traversal machine was introduced. A provider simply gets Gremlin > > bytecode and has to do something with it. For the Java-based Gremlin > > traversal machine, this is why providers implement their own GraphStep, > > VertexStep, etc. For a Python-based Gremlin traversal machine, likewise… > > > > This means that SailGraph, GraphSail, PropertyGraphSail as stated below > > don’t make sense in the current and future architectures. > > > > The next question becomes, "well how would you turn an RDF store into a > > PropertyGraph?” Easy — implement your own custom GraphStep, VertexStep, > > etc. and respective ProviderStrategies that will handle the bytecode > > compilation accordingly. > > > > The next question becomes, “well how would a PropertyGraph support > > reasoning?” Easy — implement your own custom DecorationStrategy that will > > insert reasoning into the traversal giving the RDFS schema. For instance: > > g.V().out(“likes”) > > ==> > > g.V().out(“knows”,”likes”) > > iff “likes” is a sub-property of “knows” > > > > In essence, it is possible to do this integration of RDF and TinkerPop, > it > > just needs to be done at the correct level of abstraction so that it > stays > > in line with how TinkerPop is evolving, not how it was back in 2012. > > > > Take care, > > Marko. > > > > http://markorodriguez <http://markorodriguez/>.com > > > > > > On 2017-12-13 07:46, Joshua Shinavier <[email protected]> wrote: > > > Hi Harsh,> > > > > > > Glad you are taking Daniel's work forward. In porting the code to the> > > > TinkerPop code base, might I suggest we allow for not only > > SPARQL-Gremlin,> > > > but a whole suite of RDF tools as in TP2. Perhaps call the module> > > > rdf-gremlin. Then we could have all of:> > > > > > > * SPARQL-Gremlin: executes standard SPARQL queries over a Property > Graph> > > > database> > > > * GraphSail [1,2]: stores RDF quads in the database, explicitly, and> > > > enables SPARQL and triple pattern queries over the quads> > > > * PropertyGraphSail [3]: exposes a Property Graph with of two mappings > > to> > > > the RDF data model> > > > * SailGraph [4]: takes an RDF triple store (not natively supporting> > > > Gremlin) and enables Gremlin queries> > > > * others? I have often thought that a continuous SPARQL implementation> > > > built on Gremlin would be powerful> > > > > > > The biggest mismatch between the TP2 suite and what might be built for> > > > Apache TinkerPop is that the previous suite was implemented using > > (Eclipse)> > > > RDF4j, whereas things seem to be leaning towards (Apache) Jena now.> > > > However, the same principles could be applied.> > > > > > > Josh> > > > > > > > > > [1] https://github.com/tinkerpop/blueprints/wiki/Sail-Ouplementation> > > > [2] https://github.com/joshsh/graphsail> > > > [3]> > > > https://github.com/tinkerpop/blueprints/wiki/PropertyGraphSail- > > Ouplementation> > > > [4] https://github.com/tinkerpop/blueprints/wiki/Sail-Implementation > > > > http://markorodriguez.com > > > > > > > > >
