Actually, my (3) is bad. Given that query() would always return a Traversal<S, Map<String,E>>, it would be necessary to have that linearized to Traversal<Vertex,Vertex> for the test suite to validate it. That would mean making SPARQLTraversal support extended Traversal methods like flatMap(), blah, blah… That seems excessive, though convenient.
Hm…… Thoughts?, Marko. http://markorodriguez.com > On Dec 18, 2017, at 9:21 AM, Marko Rodriguez <[email protected]> wrote: > > Hello, > > A couple of items worth considering. > > Regarding (7), that should be done prior to master/ merge. It is necessary to > follow the patterns that are established in TinkerPop regarding language > interoperability. The DSL pattern developed for Gremlin language variants > seems to be the best pattern for distinct languages as well. In essence, if > your language is not a fluent language, and instead, uses a String, then it > should be wrapped as such in a fluent interface using all the Strategy, Step, > and Traversal methods that makes sense so it works within the larger > infrastructure of TinkerPop (e.g. testing! — see below). What I proposed in > my previous email seems the easiest and cleanest way to do things. > > Regarding (3), testing is crucial. Given that this would be TinkerPop’s first > distinct language, we don’t have a pattern set forth for testing. However, > this doesn’t mean we can’t improvise on our current model. Off the top of my > head, perhaps the best way would be to follow the ProcessTestSuite and do the > SPARQL variants of those. For instance: > > > https://github.com/apache/tinkerpop/blob/master/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/VertexTest.java#L62 > > <https://github.com/apache/tinkerpop/blob/master/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/VertexTest.java#L62> > > The SPARQL test version would be: > > @Override > public Traversal<Vertex, Vertex> get_g_VX1X_out(final Object v1Id) { > return sparql.query(“SELECT ?x WHERE {“ + toURI(v1Id) + “ ?a ?x }”); > } > > In this way, sparql is your SPARQLTraversalSource for each test and query() > will return a Traversal typed according (query() will have to have solid > generic support). From there, you would implement each and every test that is > semantically possible with SPARQL (where SPARQ won’t be able to semantically > cover all Gremlin tests). > > Stephen has done a lot of recent work to generalize our test suite out of > Java so it is in a language agnostic form. I haven’t been following that work > so I’m not sure what I’m am saying above is exactly as it should be done, but > it is a start. > > HTH, > Marko. > > http://markorodriguez.com <http://markorodriguez.com/> > > > >> On Dec 18, 2017, at 7:43 AM, Harsh Thakkar <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hi Stephen and All, >> >> Thanks for going through the code. I address your questions below (in the >> same order): >> >> 1. Yes, this file can be removed. It was just to test the traversal method. >> >> 2. Yes, I have commented out the block of tests at this moment since we do >> not need to run tests at mvn clean install time. However, I kept it (in >> commented out form) if there arose a need in future for the same. It can >> surely be removed if you think, it won't be necessary. >> >> 3. There were two testing units (we continued them from Daniel's version), >> one to check whether the prefixes are being encoded correctly, the second >> one is to test whether the generated traversal is correct (in short the >> compiler is functioning as it should). Since, we extended previous work >> supporting a variety of SPARQL operators, more test cases can be added to >> validate that each of these is functioning as expected. However, as I >> mentioned in point #2. we need not do it explicitly as we (Dharmen and I) >> have already tested them on 3-4 different datasets and query-sets. Now, >> since we did not know if that was going to be formally required in the >> future or not, we left them as it is, just commented it out. >> >> 4. These resources are the graphml files that we wish to provide the users, >> for (i) loading and querying famous datasets - the Berlin SPARQL Benchmark >> (BSBM) (famous in the Semantic Web-RDF community) so that they do not have >> to look elsewhere for the same. (ii) Also, it provides a strong use-case for >> demonstrating the applicability of sparql-gremlin (creates trust in the SW >> community users) and (iii) to keep the plug-in pretty much self-dependent. >> >> 5 & 6 YES, damn it. The IDE did this. I will revert these changes. It's >> like when you are not looking, the IDE does things on it own :-/ apologies! >> >> 7. Regarding, Marko's thoughts -- Yes, I was waiting for you to reply to the >> thread. I do have some thoughts on this. But first, I was wondering if this >> (what Marko suggested) is supposed to be entirely implemented in the current >> version of sparql-gremlin 0.2, i.e. including the withStrategies() and >> withStrategies() and remote() features, or it is to be supported eventually >> (after the sparql-gremlin 0.2.0) plugin is rolled out. Also, I am not >> entirely sure I got what Marko was exactly suggesting. I bring this to light >> in the in-line style reply to Marko's comment later here. >> >> The current implementation is more of a typical compiler, the users, >> however, can use it by specifying the query file and the dataset against >> which it is to be executed via the command (once in the gremlin shell): >> >> gremlin> graph = TinkerGraph.open(..) >> gremlin> SparqlToGremlinCompiler.convertToGremlinTraversal(graph, "SELECT ?a >> WHERE {....} ") >> ==>{?x:marko, ?y:29} >> ==>{?x:josh, ?y:32} >> >> >> i.e. load a graph using pre-defined tinkerpop methods ( graph.io >> <http://graph.io/>(IoCore.gryo()).readGraph(graphName), TinkerGraph.open(), >> etc ) , then execute the traversal as above with arguments -- (graph, >> queryString), where queryString = "SPARQL query". >> >> Now Let me quote Marko's comment and reply in-line to bring more clarity: >> >> 1. There should be a SPARQLTraversalSource which supports one spawn method >> â query(String). >> This is already happening inside the code. Therefore, we do not need to >> mention it explicitly. Please correct me if I got it wrong here. >> >> 2. SPARQLTraversal is spawned and it only supports only the Traversal >> methods â next(), toList(), iterate(), etc. >> All traversal methods that are supported, available to a regular >> gremlin traversal, can be used by the sparql-gremlin compiler generated >> traversal as well. >> >> 3. query(String) adds a ConstantStep(String). >> This is happening internally (as shown in the example above), we >> can also make explicit. i.e. let the user only provide the queryString >> instead of the whole >> "SparqlToGremlinCompiler.convertToGremlinTraversal(graph, "SELECT ?a WHERE >> {....} ")" command. Does this make sense? or am I missing something here. >> >> >> 4. SPARQLTraversalSource has a registered SPARQLStrategy. >> At this moment, we leave it to the default setting for this strategy >> selection. >> >> 5. SPARQLTraversalSource should also support withStrategies(), >> withoutStrategies(), withRemote(), etc. >> Once the traversal is generated, it can support all strategies like any >> other gremlin traversal. Does this make sense to you? >> >> In a nutshell, >> What is happening is that we are converting the SPARQL queryString into a >> gremlin traversal and leave it upto the tinkerpop compiler to choose what is >> best for it. >> We only map a SPARQL query to its corresponding pattern matching gremlin >> traversal (i.e. using with .match() clause). Since, the expressibility of >> SPARQL is less than that of Gremlin (i.e. SPARQL 1.0 doesn't support/allow >> performing looping and traversing operations), we can only map what is in >> the scope of SPARQL language to Gremlin. Once the traversal is generated, it >> is left to the tinkerpop compiler to select and execute a wide range of >> strategies ( various levels of optimizations, et). >> >> NOTE - Also, Right now the sparql-gremlin compiler returns the traversal >> (string) and not Bytecode. Returning the Bytecode is also completely >> possible, if you want so. We can just perform >> traversal.asAdmin().getBytecode() for this and it is done. >> >> Since, we extended Daniel's work, we have not changed the names of classes, >> methods and variable which were used. This, however, can be changed, if you >> suggest so. >> >> 8. Yes, working in the academia doesn't groom you much on the importance of >> commenting in the code by default, or for that matter any "good-practices". >> I will add appropriate comment block in each class for the javadocs. >> >> I hope the above reply address your questions to quite some extent. Most of >> the issues are already handled internally (as I stated above). We can also >> leave some advanced features such as remote(), for the 0.2.1 release (though >> this is just an option) :D >> Having said that, Of course, we are in no hurry for the pull request. I also >> believe it makes complete sense to get things right. >> >> Cheers! >> >> On 2017-12-18 14:11, Stephen Mallette <[email protected] >> <mailto:[email protected]>> wrote: >>> Harsh, I looked at the code in a bit more detail than I have. Here's some >>> thoughts/questions I had as I was going through things: >>> >>> 1. Can this file be removed - it doesn't appear to have any usage that I >>> can see: >>> >>> https://github.com/harsh9t/tinkerpop/blob/master/sparql-gremlin/src/main/java/org/apache/tinkerpop/gremlin/sparql/Runable.java >>> >>> <https://github.com/harsh9t/tinkerpop/blob/master/sparql-gremlin/src/main/java/org/apache/tinkerpop/gremlin/sparql/Runable.java> >>> >>> 2. I note that this entire block of tests is commented out - should that be >>> removed?: >>> >>> https://github.com/harsh9t/tinkerpop/blob/master/sparql-gremlin/src/test/java/org/apache/tinkerpop/gremlin/sparql/SparqlToGremlinCompilerTest.java >>> >>> 3. I could be wrong, but even if you didn't remove the tests above, it >>> seems like unit testing is rather thin at this point. Am I missing >>> something? Is there more work to do there? >>> >>> 4. I don't understand the nature of these resources: >>> >>> https://github.com/harsh9t/tinkerpop/tree/master/sparql-gremlin/src/main/resources >>> >>> Is there any need to package those with the jar? Should those be "test" >>> resources instead? Do we need the really large data/bsbm1m.graphml file for >>> any specific reason? >>> >>> 5. What are these changes to these poms? >>> >>> https://github.com/harsh9t/tinkerpop/commit/cb3b6512ea3536f556108e5a257c4586aa4d157a >>> >>> I assume that your IDE did that accidentally and it was not intended. >>> Please revert that change. >>> >>> 6. This looks odd too - gremlin-shaded repeated again and again and again: >>> >>> https://github.com/harsh9t/tinkerpop/commit/143d16f20dcaa9c915b96cdd4adf7b1504db5d36#diff-9e90009f097eabeb25c28159571fc6a2R118 >>> >>> 7. Did you have any thoughts in reference to Marko's earlier reply that >>> described how sparql-gremlin should be used? Right now, it seems like the >>> code you have there is just the "engine" but lacks the piece that connects >>> it into the rest of the stack. From my perspective, I think we need to be >>> sure that users have an easy, clear and consistent way to use >>> sparql-gremlin before we can merge this work. Obviously, having that aspect >>> of the code thought through will impact the documentation that you write as >>> well, so I think you need to go down this path a bit further before we get >>> to the pull request stage. >>> >>> 8. We aren't big javadoc sticklers here, but we try to at least get class >>> level javadoc in place for most classes. I don't see much javadoc or >>> comments in the code right now. I think I'd like to see a modicum of >>> javadoc/comments present as part of this work. >>> >>> So, that's my broad level feedback at this point. It seems as though there >>> are some reasonably large issues there to contend with before a pull >>> request is worth issuing. That's not a problem, of course....we will just >>> keep iterating toward the goal. I'm not aware of anything that is pushing >>> us to rush to a pull request - I'm of the opinion that we can take the time >>> to get this right. >>> >>> Thanks, >>> >>> Stephen >>> >>> >>> On Fri, Dec 15, 2017 at 1:46 PM, Joshua Shinavier <[email protected]> wrote: >>> >>>> Hi Marko, >>>> >>>> I think we're more or less on the same page here; it's clear that TP3 has a >>>> different API than TP2. If you look at the guts of TP3 GraphSail [1], it >>>> uses the modern APIs, and yet does adapt them to the Sail interface. >>>> >>>> Something like PropertyGraphSail (or an equivalent Jena thing) still makes >>>> sense in TP3, as well. One interesting detail here is that in TP3, vertices >>>> can have labels, which can be turned into rdf:type statements (that, in >>>> turn, can be used to enable subclass/superclass inheritance if the graph is >>>> combined with a RDF schema. >>>> >>>> A TP3 equivalent of SailGraph would indeed be quite different in >>>> implementation -- strategies, not wrapper graph -- than what we had for >>>> Blueprints, and yet would serve the same purpose. >>>> >>>> Josh >>>> >>>> >>>> [1] >>>> https://github.com/joshsh/graphsail/tree/master/src/ >>>> main/java/net/fortytwo/tpop/sail >>>> >>>> >>>> >>>> On Fri, Dec 15, 2017 at 10:22 AM, Marko Rodriguez <[email protected]> >>>> wrote: >>>> >>>>> Hello, >>>>> >>>>> The model proposed below is in-line with TinkerPop2âs way of thinking. >>>>> Unfortunately, TinkerPop3 and more so for TinkerPop4, the Graph >>>> âstructure" >>>>> API will become deprecated. This means that the notion of âwrapping the >>>>> Graph APIâ has gone away for TP3 and will be completely gone in TP4. In >>>>> TP4, there will not even be a Graph API â no more Vertex, Edge, >>>>> Property, >>>>> etc. Only the concept of a Graph with only methods like >>>> Graph.traversal(), >>>>> Graph.partitions(), etc. >>>>> >>>>> Why was this route taken? In TinkerPop3, there was a need to support any >>>>> language besides Java. This was why Gremlin bytecode and the concept of >>>> the >>>>> Gremlin traversal machine was introduced. A provider simply gets Gremlin >>>>> bytecode and has to do something with it. For the Java-based Gremlin >>>>> traversal machine, this is why providers implement their own GraphStep, >>>>> VertexStep, etc. For a Python-based Gremlin traversal machine, likewise⦠>>>>> >>>>> This means that SailGraph, GraphSail, PropertyGraphSail as stated below >>>>> donât make sense in the current and future architectures. >>>>> >>>>> The next question becomes, "well how would you turn an RDF store into a >>>>> PropertyGraph?â Easy â implement your own custom GraphStep, >>>>> VertexStep, >>>>> etc. and respective ProviderStrategies that will handle the bytecode >>>>> compilation accordingly. >>>>> >>>>> The next question becomes, âwell how would a PropertyGraph support >>>>> reasoning?â Easy â implement your own custom DecorationStrategy that >>>>> will >>>>> insert reasoning into the traversal giving the RDFS schema. For instance: >>>>> g.V().out(âlikesâ) >>>>> ==> >>>>> g.V().out(âknowsâ,âlikesâ) >>>>> iff âlikesâ is a sub-property of âknowsâ >>>>> >>>>> In essence, it is possible to do this integration of RDF and TinkerPop, >>>> it >>>>> just needs to be done at the correct level of abstraction so that it >>>> stays >>>>> in line with how TinkerPop is evolving, not how it was back in 2012. >>>>> >>>>> Take care, >>>>> Marko. >>>>> >>>>> http://markorodriguez <http://markorodriguez/>.com >>>>> >>>>> >>>>> On 2017-12-13 07:46, Joshua Shinavier <[email protected]> wrote: >>>>>> Hi Harsh,> >>>>>> >>>>>> Glad you are taking Daniel's work forward. In porting the code to the> >>>>>> TinkerPop code base, might I suggest we allow for not only >>>>> SPARQL-Gremlin,> >>>>>> but a whole suite of RDF tools as in TP2. Perhaps call the module> >>>>>> rdf-gremlin. Then we could have all of:> >>>>>> >>>>>> * SPARQL-Gremlin: executes standard SPARQL queries over a Property >>>> Graph> >>>>>> database> >>>>>> * GraphSail [1,2]: stores RDF quads in the database, explicitly, and> >>>>>> enables SPARQL and triple pattern queries over the quads> >>>>>> * PropertyGraphSail [3]: exposes a Property Graph with of two mappings >>>>> to> >>>>>> the RDF data model> >>>>>> * SailGraph [4]: takes an RDF triple store (not natively supporting> >>>>>> Gremlin) and enables Gremlin queries> >>>>>> * others? I have often thought that a continuous SPARQL implementation> >>>>>> built on Gremlin would be powerful> >>>>>> >>>>>> The biggest mismatch between the TP2 suite and what might be built for> >>>>>> Apache TinkerPop is that the previous suite was implemented using >>>>> (Eclipse)> >>>>>> RDF4j, whereas things seem to be leaning towards (Apache) Jena now.> >>>>>> However, the same principles could be applied.> >>>>>> >>>>>> Josh> >>>>>> >>>>>> >>>>>> [1] https://github.com/tinkerpop/blueprints/wiki/Sail-Ouplementation> >>>>>> [2] https://github.com/joshsh/graphsail> >>>>>> [3]> >>>>>> https://github.com/tinkerpop/blueprints/wiki/PropertyGraphSail- >>>>> Ouplementation> >>>>>> [4] https://github.com/tinkerpop/blueprints/wiki/Sail-Implementation >>>>> >>>>> http://markorodriguez.com >
