Re: [Discussed] Integrating SPARQL-Gremlin 0.2 Plugin with the TinkerPop codebase

Harsh Thakkar Mon, 18 Dec 2017 14:00:54 -0800

Hello Marko,

I made a mistake mentioning earlier that the sparql-gremlin compiler returns a 
string, well it does not. It returns a graph traversal, apologies!


Regarding (7), I agree, it makes sense. I will wrap my head around how to get 
that done. I am already reading the dev-docs on this, from here:
http://tinkerpop.apache.org/docs/current/reference/#dsl 
as mentioned in the reply to Stephen.

Regarding (3), I was just not sure whether or not to include these tests, so 
left them out. This makes it clear. I will write the test cases, taking some 
help from Stephen on the specifics of Test Suite. However, these test cases 
will have to be written within the scope of SPARQL. We can not test a query 
which can not be written in SPARQL :) I guess you were implying the same.

Let me get this done and get back to you. This will take some time. No worries!

Cheers,
Harsh

On 2017-12-18 18:26, Marko Rodriguez <[email protected]> wrote: 
> Actually, my (3) is bad. Given that query() would always return a 
> Traversal<S, Map<String,E>>, it would be necessary to have that linearized to 
> Traversal<Vertex,Vertex> for the test suite to validate it. That would mean 
> making SPARQLTraversal support extended Traversal methods like flatMap(), 
> blah, blahâ¦ That seems excessive, though convenient.
> 
> Hmâ¦â¦ Thoughts?,
> Marko.
>  
> http://markorodriguez.com
> 
> 
> 
> > On Dec 18, 2017, at 9:21 AM, Marko Rodriguez <[email protected]> wrote:
> > 
> > Hello,
> > 
> > A couple of items worth considering.
> > 
> > Regarding (7), that should be done prior to master/ merge. It is necessary 
> > to follow the patterns that are established in TinkerPop regarding language 
> > interoperability. The DSL pattern developed for Gremlin language variants 
> > seems to be the best pattern for distinct languages as well. In essence, if 
> > your language is not a fluent language, and instead, uses a String, then it 
> > should be wrapped as such in a fluent interface using all the Strategy, 
> > Step, and Traversal methods that makes sense so it works within the larger 
> > infrastructure of TinkerPop (e.g. testing! â see below). What I proposed 
> > in my previous email seems the easiest and cleanest way to do things.
> > 
> > Regarding (3), testing is crucial. Given that this would be TinkerPopâs 
> > first distinct language, we donât have a pattern set forth for testing. 
> > However, this doesnât mean we canât improvise on our current model. Off 
> > the top of my head, perhaps the best way would be to follow the 
> > ProcessTestSuite and do the SPARQL variants of those. For instance:
> > 
> >     
> > https://github.com/apache/tinkerpop/blob/master/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/VertexTest.java#L62
> >  
> > <https://github.com/apache/tinkerpop/blob/master/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/VertexTest.java#L62>
> > 
> > The SPARQL test version would be:
> > 
> > @Override
> > public Traversal<Vertex, Vertex> get_g_VX1X_out(final Object v1Id) {
> >   return sparql.query(âSELECT ?x WHERE {â + toURI(v1Id) + â ?a ?x 
> > }â);
> > }
> > 
> > In this way, sparql is your SPARQLTraversalSource for each test and query() 
> > will return a Traversal typed according (query() will have to have solid 
> > generic support). From there, you would implement each and every test that 
> > is semantically possible with SPARQL (where SPARQ wonât be able to 
> > semantically cover all Gremlin tests).
> > 
> > Stephen has done a lot of recent work to generalize our test suite out of 
> > Java so it is in a language agnostic form. I havenât been following that 
> > work so Iâm not sure what Iâm am saying above is exactly as it should 
> > be done, but it is a start.
> > 
> > HTH,
> > Marko.
> > 
> > http://markorodriguez.com <http://markorodriguez.com/>
> > 
> > 
> > 
> >> On Dec 18, 2017, at 7:43 AM, Harsh Thakkar <[email protected] 
> >> <mailto:[email protected]>> wrote:
> >> 
> >> Hi Stephen and All,
> >> 
> >> Thanks for going through the code. I address your questions below (in the 
> >> same order):
> >> 
> >> 1. Yes, this file can be removed. It was just to test the traversal 
> >> method. 
> >> 
> >> 2. Yes, I have commented out the block of tests at this moment since we do 
> >> not need to run tests at mvn clean install time. However, I kept it (in 
> >> commented out form) if there arose a need in future for the same. It can 
> >> surely be removed if you think, it won't be necessary.
> >> 
> >> 3. There were two testing units (we continued them from Daniel's version), 
> >> one to check whether the prefixes are being encoded correctly, the second 
> >> one is to test whether the generated traversal is correct (in short the 
> >> compiler is functioning as it should). Since, we extended previous work 
> >> supporting a variety of SPARQL operators, more test cases can be added to 
> >> validate that each of these is functioning as expected. However, as I 
> >> mentioned in point #2. we need not do it explicitly as we (Dharmen and I) 
> >> have already tested them on 3-4 different datasets and query-sets. Now, 
> >> since we did not know if that was going to be formally required in the 
> >> future or not, we left them as it is, just commented it out.
> >> 
> >> 4. These resources are the graphml files that we wish to provide the 
> >> users, for (i) loading and querying famous datasets - the Berlin SPARQL 
> >> Benchmark (BSBM)  (famous in the Semantic Web-RDF community) so that they 
> >> do not have to look elsewhere for the same. (ii) Also, it provides a 
> >> strong use-case for demonstrating the applicability of sparql-gremlin 
> >> (creates trust in the SW community users) and (iii) to keep the plug-in 
> >> pretty much self-dependent.
> >> 
> >> 5 & 6  YES, damn it. The IDE did this. I will revert these changes. It's 
> >> like when you are not looking, the IDE does things on it own :-/ apologies!
> >> 
> >> 7. Regarding, Marko's thoughts -- Yes, I was waiting for you to reply to 
> >> the thread. I do have some thoughts on this. But first, I was wondering if 
> >> this (what Marko suggested) is supposed to be entirely implemented in the 
> >> current version of sparql-gremlin 0.2, i.e. including the withStrategies() 
> >> and withStrategies() and remote() features, or it is to be supported 
> >> eventually (after the sparql-gremlin 0.2.0) plugin is rolled out. Also, I 
> >> am not entirely sure I got what Marko was exactly suggesting. I bring this 
> >> to light in the in-line style reply to Marko's comment later here.
> >> 
> >> The current implementation is more of a typical compiler, the users, 
> >> however, can use it by specifying the query file and the dataset against 
> >> which it is to be executed via the command (once in the gremlin shell):
> >> 
> >> gremlin> graph = TinkerGraph.open(..) 
> >> gremlin> SparqlToGremlinCompiler.convertToGremlinTraversal(graph, "SELECT 
> >> ?a WHERE {....} ") 
> >> ==>{?x:marko, ?y:29}
> >> ==>{?x:josh, ?y:32}
> >> 
> >> 
> >> i.e. load a graph using pre-defined tinkerpop methods ( graph.io 
> >> <http://graph.io/>(IoCore.gryo()).readGraph(graphName), 
> >> TinkerGraph.open(), etc ) , then execute the traversal as above with 
> >> arguments -- (graph, queryString), where queryString = "SPARQL query".
> >> 
> >> Now Let me quote Marko's comment and reply in-line to bring more clarity:
> >> 
> >> 1. There should be a SPARQLTraversalSource which supports one spawn method 
> >> Ã¢ÂÂ query(String).
> >>    This is already happening inside the code. Therefore, we do not need to 
> >> mention it explicitly. Please correct me if I got it wrong here.
> >>     
> >> 2. SPARQLTraversal is spawned and it only supports only the Traversal 
> >> methods Ã¢ÂÂ next(), toList(), iterate(), etc.
> >>    All traversal methods that are supported, available to a regular 
> >> gremlin traversal, can be used by the sparql-gremlin compiler generated 
> >> traversal as well.  
> >>    
> >> 3. query(String) adds a ConstantStep(String).
> >>             This is happening internally (as shown in the example above), 
> >> we can also make explicit. i.e. let the user only provide the queryString 
> >> instead of the whole 
> >> "SparqlToGremlinCompiler.convertToGremlinTraversal(graph, "SELECT ?a WHERE 
> >> {....} ")" command. Does this make sense? or am I missing something here.
> >> 
> >> 
> >> 4. SPARQLTraversalSource has a registered SPARQLStrategy.
> >>    At this moment, we leave it to the default setting for this strategy 
> >> selection.
> >> 
> >> 5. SPARQLTraversalSource should also support withStrategies(), 
> >> withoutStrategies(), withRemote(), etc.
> >>    Once the traversal is generated, it can support all strategies like any 
> >> other gremlin traversal. Does this make sense to you?
> >> 
> >> In a nutshell, 
> >> What is happening is that we are converting the SPARQL queryString into a 
> >> gremlin traversal and leave it upto the tinkerpop compiler to choose what 
> >> is best for it. 
> >> We only map a SPARQL query to its corresponding pattern matching gremlin 
> >> traversal (i.e. using with .match() clause). Since, the expressibility of 
> >> SPARQL is less than that of Gremlin (i.e. SPARQL 1.0 doesn't support/allow 
> >>  performing looping and traversing operations), we can only map what is in 
> >> the scope of SPARQL language to Gremlin. Once the traversal is generated, 
> >> it is left to the tinkerpop compiler to select and execute a wide range of 
> >> strategies ( various levels of optimizations, et). 
> >> 
> >> NOTE - Also, Right now the sparql-gremlin compiler returns the traversal 
> >> (string) and not Bytecode. Returning the Bytecode is also completely 
> >> possible, if you want so. We can just perform 
> >> traversal.asAdmin().getBytecode() for this and it is done.
> >> 
> >> Since, we extended Daniel's work, we have not changed the names of 
> >> classes, methods and variable which were used. This, however, can be 
> >> changed, if you suggest so. 
> >> 
> >> 8. Yes, working in the academia doesn't groom you much on the importance 
> >> of commenting in the code by default, or for that matter any 
> >> "good-practices". I will add appropriate comment block in each class for 
> >> the javadocs.
> >> 
> >> I hope the above reply address your questions to quite some extent.  Most 
> >> of the issues are already handled internally (as I stated above). We can 
> >> also leave some advanced features such as remote(), for the 0.2.1 release 
> >> (though this is just an option) :D  
> >> Having said that, Of course, we are in no hurry for the pull request. I 
> >> also believe it makes complete sense to get things right. 
> >> 
> >> Cheers!
> >> 
> >> On 2017-12-18 14:11, Stephen Mallette <[email protected] 
> >> <mailto:[email protected]>> wrote: 
> >>> Harsh, I looked at the code in a bit more detail than I have. Here's some
> >>> thoughts/questions I had as I was going through things:
> >>> 
> >>> 1. Can this file be removed - it doesn't appear to have any usage that I
> >>> can see:
> >>> 
> >>> https://github.com/harsh9t/tinkerpop/blob/master/sparql-gremlin/src/main/java/org/apache/tinkerpop/gremlin/sparql/Runable.java
> >>>  
> >>> <https://github.com/harsh9t/tinkerpop/blob/master/sparql-gremlin/src/main/java/org/apache/tinkerpop/gremlin/sparql/Runable.java>
> >>> 
> >>> 2. I note that this entire block of tests is commented out - should that 
> >>> be
> >>> removed?:
> >>> 
> >>> https://github.com/harsh9t/tinkerpop/blob/master/sparql-gremlin/src/test/java/org/apache/tinkerpop/gremlin/sparql/SparqlToGremlinCompilerTest.java
> >>> 
> >>> 3. I could be wrong, but even if you didn't remove the tests above, it
> >>> seems like unit testing is rather thin at this point. Am I missing
> >>> something? Is there more work to do there?
> >>> 
> >>> 4. I don't understand the nature of these resources:
> >>> 
> >>> https://github.com/harsh9t/tinkerpop/tree/master/sparql-gremlin/src/main/resources
> >>> 
> >>> Is there any need to package those with the jar? Should those be "test"
> >>> resources instead? Do we need the really large data/bsbm1m.graphml file 
> >>> for
> >>> any specific reason?
> >>> 
> >>> 5. What are these changes to these poms?
> >>> 
> >>> https://github.com/harsh9t/tinkerpop/commit/cb3b6512ea3536f556108e5a257c4586aa4d157a
> >>> 
> >>> I assume that your IDE did that accidentally and it was not intended.
> >>> Please revert that change.
> >>> 
> >>> 6. This looks odd too - gremlin-shaded repeated again and again and again:
> >>> 
> >>> https://github.com/harsh9t/tinkerpop/commit/143d16f20dcaa9c915b96cdd4adf7b1504db5d36#diff-9e90009f097eabeb25c28159571fc6a2R118
> >>> 
> >>> 7. Did you have any thoughts in reference to Marko's earlier reply that
> >>> described how sparql-gremlin should be used? Right now, it seems like the
> >>> code you have there is just the "engine" but lacks the piece that connects
> >>> it into the rest of the stack. From my perspective, I think we need to be
> >>> sure that users have an easy, clear and consistent way to use
> >>> sparql-gremlin before we can merge this work. Obviously, having that 
> >>> aspect
> >>> of the code thought through will impact the documentation that you write 
> >>> as
> >>> well, so I think you need to go down this path a bit further before we get
> >>> to the pull request stage.
> >>> 
> >>> 8. We aren't big javadoc sticklers here, but we try to at least get class
> >>> level javadoc in place for most classes. I don't see much javadoc or
> >>> comments in the code right now. I think I'd like to see a modicum of
> >>> javadoc/comments present as part of this work.
> >>> 
> >>> So, that's my broad level feedback at this point. It seems as though there
> >>> are some reasonably large issues there to contend with before a pull
> >>> request is worth issuing. That's not a problem, of course....we will just
> >>> keep iterating toward the goal. I'm not aware of anything that is pushing
> >>> us to rush to a pull request - I'm of the opinion that we can take the 
> >>> time
> >>> to get this right.
> >>> 
> >>> Thanks,
> >>> 
> >>> Stephen
> >>> 
> >>> 
> >>> On Fri, Dec 15, 2017 at 1:46 PM, Joshua Shinavier <[email protected]> 
> >>> wrote:
> >>> 
> >>>> Hi Marko,
> >>>> 
> >>>> I think we're more or less on the same page here; it's clear that TP3 
> >>>> has a
> >>>> different API than TP2. If you look at the guts of TP3 GraphSail [1], it
> >>>> uses the modern APIs, and yet does adapt them to the Sail interface.
> >>>> 
> >>>> Something like PropertyGraphSail (or an equivalent Jena thing) still 
> >>>> makes
> >>>> sense in TP3, as well. One interesting detail here is that in TP3, 
> >>>> vertices
> >>>> can have labels, which can be turned into rdf:type statements (that, in
> >>>> turn, can be used to enable subclass/superclass inheritance if the graph 
> >>>> is
> >>>> combined with a RDF schema.
> >>>> 
> >>>> A TP3 equivalent of SailGraph would indeed be quite different in
> >>>> implementation -- strategies, not wrapper graph -- than what we had for
> >>>> Blueprints, and yet would serve the same purpose.
> >>>> 
> >>>> Josh
> >>>> 
> >>>> 
> >>>> [1]
> >>>> https://github.com/joshsh/graphsail/tree/master/src/
> >>>> main/java/net/fortytwo/tpop/sail
> >>>> 
> >>>> 
> >>>> 
> >>>> On Fri, Dec 15, 2017 at 10:22 AM, Marko Rodriguez <[email protected]>
> >>>> wrote:
> >>>> 
> >>>>> Hello,
> >>>>> 
> >>>>> The model proposed below is in-line with TinkerPop2Ã¢ÂÂs way of 
> >>>>> thinking.
> >>>>> Unfortunately, TinkerPop3 and more so for TinkerPop4, the Graph
> >>>> Ã¢ÂÂstructure"
> >>>>> API will become deprecated. This means that the notion of 
> >>>>> Ã¢ÂÂwrapping the
> >>>>> Graph APIÃ¢ÂÂ has gone away for TP3 and will be completely gone in 
> >>>>> TP4. In
> >>>>> TP4, there will not even be a Graph API Ã¢ÂÂ no more Vertex, Edge, 
> >>>>> Property,
> >>>>> etc. Only the concept of a Graph with only methods like
> >>>> Graph.traversal(),
> >>>>> Graph.partitions(), etc.
> >>>>> 
> >>>>> Why was this route taken? In TinkerPop3, there was a need to support any
> >>>>> language besides Java. This was why Gremlin bytecode and the concept of
> >>>> the
> >>>>> Gremlin traversal machine was introduced. A provider simply gets Gremlin
> >>>>> bytecode and has to do something with it. For the Java-based Gremlin
> >>>>> traversal machine, this is why providers implement their own GraphStep,
> >>>>> VertexStep, etc. For a Python-based Gremlin traversal machine, 
> >>>>> likewiseÃ¢ÂÂ¦
> >>>>> 
> >>>>> This means that SailGraph, GraphSail, PropertyGraphSail as stated below
> >>>>> donÃ¢ÂÂt make sense in the current and future architectures.
> >>>>> 
> >>>>> The next question becomes, "well how would you turn an RDF store into a
> >>>>> PropertyGraph?Ã¢ÂÂ Easy Ã¢ÂÂ implement your own custom GraphStep, 
> >>>>> VertexStep,
> >>>>> etc. and respective ProviderStrategies that will handle the bytecode
> >>>>> compilation accordingly.
> >>>>> 
> >>>>> The next question becomes, Ã¢ÂÂwell how would a PropertyGraph support
> >>>>> reasoning?Ã¢ÂÂ Easy Ã¢ÂÂ implement your own custom 
> >>>>> DecorationStrategy that will
> >>>>> insert reasoning into the traversal giving the RDFS schema. For 
> >>>>> instance:
> >>>>>        g.V().out(Ã¢ÂÂlikesÃ¢ÂÂ)
> >>>>>                ==>
> >>>>>        g.V().out(Ã¢ÂÂknowsÃ¢ÂÂ,Ã¢ÂÂlikesÃ¢ÂÂ)
> >>>>>                iff Ã¢ÂÂlikesÃ¢ÂÂ is a sub-property of 
> >>>>> Ã¢ÂÂknowsÃ¢ÂÂ
> >>>>> 
> >>>>> In essence, it is possible to do this integration of RDF and TinkerPop,
> >>>> it
> >>>>> just needs to be done at the correct level of abstraction so that it
> >>>> stays
> >>>>> in line with how TinkerPop is evolving, not how it was back in 2012.
> >>>>> 
> >>>>> Take care,
> >>>>> Marko.
> >>>>> 
> >>>>> http://markorodriguez <http://markorodriguez/>.com
> >>>>> 
> >>>>> 
> >>>>> On 2017-12-13 07:46, Joshua Shinavier <[email protected]> wrote:
> >>>>>> Hi Harsh,>
> >>>>>> 
> >>>>>> Glad you are taking Daniel's work forward. In porting the code to the>
> >>>>>> TinkerPop code base, might I suggest we allow for not only
> >>>>> SPARQL-Gremlin,>
> >>>>>> but a whole suite of RDF tools as in TP2. Perhaps call the module>
> >>>>>> rdf-gremlin. Then we could have all of:>
> >>>>>> 
> >>>>>> * SPARQL-Gremlin: executes standard SPARQL queries over a Property
> >>>> Graph>
> >>>>>> database>
> >>>>>> * GraphSail [1,2]: stores RDF quads in the database, explicitly, and>
> >>>>>> enables SPARQL and triple pattern queries over the quads>
> >>>>>> * PropertyGraphSail [3]: exposes a Property Graph with of two mappings
> >>>>> to>
> >>>>>> the RDF data model>
> >>>>>> * SailGraph [4]: takes an RDF triple store (not natively supporting>
> >>>>>> Gremlin) and enables Gremlin queries>
> >>>>>> * others? I have often thought that a continuous SPARQL implementation>
> >>>>>> built on Gremlin would be powerful>
> >>>>>> 
> >>>>>> The biggest mismatch between the TP2 suite and what might be built for>
> >>>>>> Apache TinkerPop is that the previous suite was implemented using
> >>>>> (Eclipse)>
> >>>>>> RDF4j, whereas things seem to be leaning towards (Apache) Jena now.>
> >>>>>> However, the same principles could be applied.>
> >>>>>> 
> >>>>>> Josh>
> >>>>>> 
> >>>>>> 
> >>>>>> [1] https://github.com/tinkerpop/blueprints/wiki/Sail-Ouplementation>
> >>>>>> [2] https://github.com/joshsh/graphsail>
> >>>>>> [3]>
> >>>>>> https://github.com/tinkerpop/blueprints/wiki/PropertyGraphSail-
> >>>>> Ouplementation>
> >>>>>> [4] https://github.com/tinkerpop/blueprints/wiki/Sail-Implementation
> >>>>> 
> >>>>> http://markorodriguez.com
> > 
> 
>

Re: [Discussed] Integrating SPARQL-Gremlin 0.2 Plugin with the TinkerPop codebase

Reply via email to