Re: [Discussed] Integrating SPARQL-Gremlin 0.2 Plugin with the TinkerPop codebase

Marko Rodriguez Mon, 18 Dec 2017 09:27:15 -0800

Actually, my (3) is bad. Given that query() would always return a Traversal<S, 
Map<String,E>>, it would be necessary to have that linearized to 
Traversal<Vertex,Vertex> for the test suite to validate it. That would mean 
making SPARQLTraversal support extended Traversal methods like flatMap(), blah, 
blah… That seems excessive, though convenient.


Hm…… Thoughts?,
Marko.
 
http://markorodriguez.com



> On Dec 18, 2017, at 9:21 AM, Marko Rodriguez <[email protected]> wrote:
> 
> Hello,
> 
> A couple of items worth considering.
> 
> Regarding (7), that should be done prior to master/ merge. It is necessary to 
> follow the patterns that are established in TinkerPop regarding language 
> interoperability. The DSL pattern developed for Gremlin language variants 
> seems to be the best pattern for distinct languages as well. In essence, if 
> your language is not a fluent language, and instead, uses a String, then it 
> should be wrapped as such in a fluent interface using all the Strategy, Step, 
> and Traversal methods that makes sense so it works within the larger 
> infrastructure of TinkerPop (e.g. testing! — see below). What I proposed in 
> my previous email seems the easiest and cleanest way to do things.
> 
> Regarding (3), testing is crucial. Given that this would be TinkerPop’s first 
> distinct language, we don’t have a pattern set forth for testing. However, 
> this doesn’t mean we can’t improvise on our current model. Off the top of my 
> head, perhaps the best way would be to follow the ProcessTestSuite and do the 
> SPARQL variants of those. For instance:
> 
>       
> https://github.com/apache/tinkerpop/blob/master/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/VertexTest.java#L62
>  
> <https://github.com/apache/tinkerpop/blob/master/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/VertexTest.java#L62>
> 
> The SPARQL test version would be:
> 
> @Override
> public Traversal<Vertex, Vertex> get_g_VX1X_out(final Object v1Id) {
>   return sparql.query(“SELECT ?x WHERE {“ + toURI(v1Id) + “ ?a ?x }”);
> }
> 
> In this way, sparql is your SPARQLTraversalSource for each test and query() 
> will return a Traversal typed according (query() will have to have solid 
> generic support). From there, you would implement each and every test that is 
> semantically possible with SPARQL (where SPARQ won’t be able to semantically 
> cover all Gremlin tests).
> 
> Stephen has done a lot of recent work to generalize our test suite out of 
> Java so it is in a language agnostic form. I haven’t been following that work 
> so I’m not sure what I’m am saying above is exactly as it should be done, but 
> it is a start.
> 
> HTH,
> Marko.
> 
> http://markorodriguez.com <http://markorodriguez.com/>
> 
> 
> 
>> On Dec 18, 2017, at 7:43 AM, Harsh Thakkar <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Hi Stephen and All,
>> 
>> Thanks for going through the code. I address your questions below (in the 
>> same order):
>> 
>> 1. Yes, this file can be removed. It was just to test the traversal method. 
>> 
>> 2. Yes, I have commented out the block of tests at this moment since we do 
>> not need to run tests at mvn clean install time. However, I kept it (in 
>> commented out form) if there arose a need in future for the same. It can 
>> surely be removed if you think, it won't be necessary.
>> 
>> 3. There were two testing units (we continued them from Daniel's version), 
>> one to check whether the prefixes are being encoded correctly, the second 
>> one is to test whether the generated traversal is correct (in short the 
>> compiler is functioning as it should). Since, we extended previous work 
>> supporting a variety of SPARQL operators, more test cases can be added to 
>> validate that each of these is functioning as expected. However, as I 
>> mentioned in point #2. we need not do it explicitly as we (Dharmen and I) 
>> have already tested them on 3-4 different datasets and query-sets. Now, 
>> since we did not know if that was going to be formally required in the 
>> future or not, we left them as it is, just commented it out.
>> 
>> 4. These resources are the graphml files that we wish to provide the users, 
>> for (i) loading and querying famous datasets - the Berlin SPARQL Benchmark 
>> (BSBM)  (famous in the Semantic Web-RDF community) so that they do not have 
>> to look elsewhere for the same. (ii) Also, it provides a strong use-case for 
>> demonstrating the applicability of sparql-gremlin (creates trust in the SW 
>> community users) and (iii) to keep the plug-in pretty much self-dependent.
>> 
>> 5 & 6  YES, damn it. The IDE did this. I will revert these changes. It's 
>> like when you are not looking, the IDE does things on it own :-/ apologies!
>> 
>> 7. Regarding, Marko's thoughts -- Yes, I was waiting for you to reply to the 
>> thread. I do have some thoughts on this. But first, I was wondering if this 
>> (what Marko suggested) is supposed to be entirely implemented in the current 
>> version of sparql-gremlin 0.2, i.e. including the withStrategies() and 
>> withStrategies() and remote() features, or it is to be supported eventually 
>> (after the sparql-gremlin 0.2.0) plugin is rolled out. Also, I am not 
>> entirely sure I got what Marko was exactly suggesting. I bring this to light 
>> in the in-line style reply to Marko's comment later here.
>> 
>> The current implementation is more of a typical compiler, the users, 
>> however, can use it by specifying the query file and the dataset against 
>> which it is to be executed via the command (once in the gremlin shell):
>> 
>> gremlin> graph = TinkerGraph.open(..) 
>> gremlin> SparqlToGremlinCompiler.convertToGremlinTraversal(graph, "SELECT ?a 
>> WHERE {....} ") 
>> ==>{?x:marko, ?y:29}
>> ==>{?x:josh, ?y:32}
>> 
>> 
>> i.e. load a graph using pre-defined tinkerpop methods ( graph.io 
>> <http://graph.io/>(IoCore.gryo()).readGraph(graphName), TinkerGraph.open(), 
>> etc ) , then execute the traversal as above with arguments -- (graph, 
>> queryString), where queryString = "SPARQL query".
>> 
>> Now Let me quote Marko's comment and reply in-line to bring more clarity:
>> 
>> 1. There should be a SPARQLTraversalSource which supports one spawn method 
>> â query(String).
>>      This is already happening inside the code. Therefore, we do not need to 
>> mention it explicitly. Please correct me if I got it wrong here.
>>       
>> 2. SPARQLTraversal is spawned and it only supports only the Traversal 
>> methods â next(), toList(), iterate(), etc.
>>      All traversal methods that are supported, available to a regular 
>> gremlin traversal, can be used by the sparql-gremlin compiler generated 
>> traversal as well.  
>>      
>> 3. query(String) adds a ConstantStep(String).
>>             This is happening internally (as shown in the example above), we 
>> can also make explicit. i.e. let the user only provide the queryString 
>> instead of the whole 
>> "SparqlToGremlinCompiler.convertToGremlinTraversal(graph, "SELECT ?a WHERE 
>> {....} ")" command. Does this make sense? or am I missing something here.
>> 
>> 
>> 4. SPARQLTraversalSource has a registered SPARQLStrategy.
>>      At this moment, we leave it to the default setting for this strategy 
>> selection.
>> 
>> 5. SPARQLTraversalSource should also support withStrategies(), 
>> withoutStrategies(), withRemote(), etc.
>>      Once the traversal is generated, it can support all strategies like any 
>> other gremlin traversal. Does this make sense to you?
>> 
>> In a nutshell, 
>> What is happening is that we are converting the SPARQL queryString into a 
>> gremlin traversal and leave it upto the tinkerpop compiler to choose what is 
>> best for it. 
>> We only map a SPARQL query to its corresponding pattern matching gremlin 
>> traversal (i.e. using with .match() clause). Since, the expressibility of 
>> SPARQL is less than that of Gremlin (i.e. SPARQL 1.0 doesn't support/allow  
>> performing looping and traversing operations), we can only map what is in 
>> the scope of SPARQL language to Gremlin. Once the traversal is generated, it 
>> is left to the tinkerpop compiler to select and execute a wide range of 
>> strategies ( various levels of optimizations, et). 
>> 
>> NOTE - Also, Right now the sparql-gremlin compiler returns the traversal 
>> (string) and not Bytecode. Returning the Bytecode is also completely 
>> possible, if you want so. We can just perform 
>> traversal.asAdmin().getBytecode() for this and it is done.
>> 
>> Since, we extended Daniel's work, we have not changed the names of classes, 
>> methods and variable which were used. This, however, can be changed, if you 
>> suggest so. 
>> 
>> 8. Yes, working in the academia doesn't groom you much on the importance of 
>> commenting in the code by default, or for that matter any "good-practices". 
>> I will add appropriate comment block in each class for the javadocs.
>> 
>> I hope the above reply address your questions to quite some extent.  Most of 
>> the issues are already handled internally (as I stated above). We can also 
>> leave some advanced features such as remote(), for the 0.2.1 release (though 
>> this is just an option) :D  
>> Having said that, Of course, we are in no hurry for the pull request. I also 
>> believe it makes complete sense to get things right. 
>> 
>> Cheers!
>> 
>> On 2017-12-18 14:11, Stephen Mallette <[email protected] 
>> <mailto:[email protected]>> wrote: 
>>> Harsh, I looked at the code in a bit more detail than I have. Here's some
>>> thoughts/questions I had as I was going through things:
>>> 
>>> 1. Can this file be removed - it doesn't appear to have any usage that I
>>> can see:
>>> 
>>> https://github.com/harsh9t/tinkerpop/blob/master/sparql-gremlin/src/main/java/org/apache/tinkerpop/gremlin/sparql/Runable.java
>>>  
>>> <https://github.com/harsh9t/tinkerpop/blob/master/sparql-gremlin/src/main/java/org/apache/tinkerpop/gremlin/sparql/Runable.java>
>>> 
>>> 2. I note that this entire block of tests is commented out - should that be
>>> removed?:
>>> 
>>> https://github.com/harsh9t/tinkerpop/blob/master/sparql-gremlin/src/test/java/org/apache/tinkerpop/gremlin/sparql/SparqlToGremlinCompilerTest.java
>>> 
>>> 3. I could be wrong, but even if you didn't remove the tests above, it
>>> seems like unit testing is rather thin at this point. Am I missing
>>> something? Is there more work to do there?
>>> 
>>> 4. I don't understand the nature of these resources:
>>> 
>>> https://github.com/harsh9t/tinkerpop/tree/master/sparql-gremlin/src/main/resources
>>> 
>>> Is there any need to package those with the jar? Should those be "test"
>>> resources instead? Do we need the really large data/bsbm1m.graphml file for
>>> any specific reason?
>>> 
>>> 5. What are these changes to these poms?
>>> 
>>> https://github.com/harsh9t/tinkerpop/commit/cb3b6512ea3536f556108e5a257c4586aa4d157a
>>> 
>>> I assume that your IDE did that accidentally and it was not intended.
>>> Please revert that change.
>>> 
>>> 6. This looks odd too - gremlin-shaded repeated again and again and again:
>>> 
>>> https://github.com/harsh9t/tinkerpop/commit/143d16f20dcaa9c915b96cdd4adf7b1504db5d36#diff-9e90009f097eabeb25c28159571fc6a2R118
>>> 
>>> 7. Did you have any thoughts in reference to Marko's earlier reply that
>>> described how sparql-gremlin should be used? Right now, it seems like the
>>> code you have there is just the "engine" but lacks the piece that connects
>>> it into the rest of the stack. From my perspective, I think we need to be
>>> sure that users have an easy, clear and consistent way to use
>>> sparql-gremlin before we can merge this work. Obviously, having that aspect
>>> of the code thought through will impact the documentation that you write as
>>> well, so I think you need to go down this path a bit further before we get
>>> to the pull request stage.
>>> 
>>> 8. We aren't big javadoc sticklers here, but we try to at least get class
>>> level javadoc in place for most classes. I don't see much javadoc or
>>> comments in the code right now. I think I'd like to see a modicum of
>>> javadoc/comments present as part of this work.
>>> 
>>> So, that's my broad level feedback at this point. It seems as though there
>>> are some reasonably large issues there to contend with before a pull
>>> request is worth issuing. That's not a problem, of course....we will just
>>> keep iterating toward the goal. I'm not aware of anything that is pushing
>>> us to rush to a pull request - I'm of the opinion that we can take the time
>>> to get this right.
>>> 
>>> Thanks,
>>> 
>>> Stephen
>>> 
>>> 
>>> On Fri, Dec 15, 2017 at 1:46 PM, Joshua Shinavier <[email protected]> wrote:
>>> 
>>>> Hi Marko,
>>>> 
>>>> I think we're more or less on the same page here; it's clear that TP3 has a
>>>> different API than TP2. If you look at the guts of TP3 GraphSail [1], it
>>>> uses the modern APIs, and yet does adapt them to the Sail interface.
>>>> 
>>>> Something like PropertyGraphSail (or an equivalent Jena thing) still makes
>>>> sense in TP3, as well. One interesting detail here is that in TP3, vertices
>>>> can have labels, which can be turned into rdf:type statements (that, in
>>>> turn, can be used to enable subclass/superclass inheritance if the graph is
>>>> combined with a RDF schema.
>>>> 
>>>> A TP3 equivalent of SailGraph would indeed be quite different in
>>>> implementation -- strategies, not wrapper graph -- than what we had for
>>>> Blueprints, and yet would serve the same purpose.
>>>> 
>>>> Josh
>>>> 
>>>> 
>>>> [1]
>>>> https://github.com/joshsh/graphsail/tree/master/src/
>>>> main/java/net/fortytwo/tpop/sail
>>>> 
>>>> 
>>>> 
>>>> On Fri, Dec 15, 2017 at 10:22 AM, Marko Rodriguez <[email protected]>
>>>> wrote:
>>>> 
>>>>> Hello,
>>>>> 
>>>>> The model proposed below is in-line with TinkerPop2âs way of thinking.
>>>>> Unfortunately, TinkerPop3 and more so for TinkerPop4, the Graph
>>>> âstructure"
>>>>> API will become deprecated. This means that the notion of âwrapping the
>>>>> Graph APIâ has gone away for TP3 and will be completely gone in TP4. In
>>>>> TP4, there will not even be a Graph API â no more Vertex, Edge, 
>>>>> Property,
>>>>> etc. Only the concept of a Graph with only methods like
>>>> Graph.traversal(),
>>>>> Graph.partitions(), etc.
>>>>> 
>>>>> Why was this route taken? In TinkerPop3, there was a need to support any
>>>>> language besides Java. This was why Gremlin bytecode and the concept of
>>>> the
>>>>> Gremlin traversal machine was introduced. A provider simply gets Gremlin
>>>>> bytecode and has to do something with it. For the Java-based Gremlin
>>>>> traversal machine, this is why providers implement their own GraphStep,
>>>>> VertexStep, etc. For a Python-based Gremlin traversal machine, likewiseâ¦
>>>>> 
>>>>> This means that SailGraph, GraphSail, PropertyGraphSail as stated below
>>>>> donât make sense in the current and future architectures.
>>>>> 
>>>>> The next question becomes, "well how would you turn an RDF store into a
>>>>> PropertyGraph?â Easy â implement your own custom GraphStep, 
>>>>> VertexStep,
>>>>> etc. and respective ProviderStrategies that will handle the bytecode
>>>>> compilation accordingly.
>>>>> 
>>>>> The next question becomes, âwell how would a PropertyGraph support
>>>>> reasoning?â Easy â implement your own custom DecorationStrategy that 
>>>>> will
>>>>> insert reasoning into the traversal giving the RDFS schema. For instance:
>>>>>        g.V().out(âlikesâ)
>>>>>                ==>
>>>>>        g.V().out(âknowsâ,âlikesâ)
>>>>>                iff âlikesâ is a sub-property of âknowsâ
>>>>> 
>>>>> In essence, it is possible to do this integration of RDF and TinkerPop,
>>>> it
>>>>> just needs to be done at the correct level of abstraction so that it
>>>> stays
>>>>> in line with how TinkerPop is evolving, not how it was back in 2012.
>>>>> 
>>>>> Take care,
>>>>> Marko.
>>>>> 
>>>>> http://markorodriguez <http://markorodriguez/>.com
>>>>> 
>>>>> 
>>>>> On 2017-12-13 07:46, Joshua Shinavier <[email protected]> wrote:
>>>>>> Hi Harsh,>
>>>>>> 
>>>>>> Glad you are taking Daniel's work forward. In porting the code to the>
>>>>>> TinkerPop code base, might I suggest we allow for not only
>>>>> SPARQL-Gremlin,>
>>>>>> but a whole suite of RDF tools as in TP2. Perhaps call the module>
>>>>>> rdf-gremlin. Then we could have all of:>
>>>>>> 
>>>>>> * SPARQL-Gremlin: executes standard SPARQL queries over a Property
>>>> Graph>
>>>>>> database>
>>>>>> * GraphSail [1,2]: stores RDF quads in the database, explicitly, and>
>>>>>> enables SPARQL and triple pattern queries over the quads>
>>>>>> * PropertyGraphSail [3]: exposes a Property Graph with of two mappings
>>>>> to>
>>>>>> the RDF data model>
>>>>>> * SailGraph [4]: takes an RDF triple store (not natively supporting>
>>>>>> Gremlin) and enables Gremlin queries>
>>>>>> * others? I have often thought that a continuous SPARQL implementation>
>>>>>> built on Gremlin would be powerful>
>>>>>> 
>>>>>> The biggest mismatch between the TP2 suite and what might be built for>
>>>>>> Apache TinkerPop is that the previous suite was implemented using
>>>>> (Eclipse)>
>>>>>> RDF4j, whereas things seem to be leaning towards (Apache) Jena now.>
>>>>>> However, the same principles could be applied.>
>>>>>> 
>>>>>> Josh>
>>>>>> 
>>>>>> 
>>>>>> [1] https://github.com/tinkerpop/blueprints/wiki/Sail-Ouplementation>
>>>>>> [2] https://github.com/joshsh/graphsail>
>>>>>> [3]>
>>>>>> https://github.com/tinkerpop/blueprints/wiki/PropertyGraphSail-
>>>>> Ouplementation>
>>>>>> [4] https://github.com/tinkerpop/blueprints/wiki/Sail-Implementation
>>>>> 
>>>>> http://markorodriguez.com
>

Re: [Discussed] Integrating SPARQL-Gremlin 0.2 Plugin with the TinkerPop codebase

Reply via email to