Hello all, I would like to take some concrete steps toward the TinkerPop 4 interoperability goals I've stated a few times (e.g. see TinkerPop 2020 <https://www.slideshare.net/joshsh/tinkerpop-2020>from last year). At a meetup <https://www.meetup.com/Category-Theory/events/277331504/> a couple of months ago, I demonstrated an approach for generating TinkerPop APIs consistently into different languages. I have started to check in some of that generated code in a branch (see my commits here <https://github.com/apache/tinkerpop/commits/TINKERPOP-2563-language/gremlin-language>) and add bits and pieces for RDF support, as well.
The Apache Software Foundation asks us to discuss any significant changes to the code base on the dev list. Since these steps toward TP4 will be major changes if and when they are merged into the master branch, I will start discussing them here. Expect occasional emails from me about the various things I will be doing in the branch. I absolutely invite comments, feedback, and actual discussion on these design proposals, but even if it's just me issuing self-affirming statements into the void like the King of Pointland, I will just carry on, because that's how this process works. A brief summary of the changes so far: - *Abstract specification of Gremlin traversals*. I have turned Stephen's Gremlin.g4 <https://github.com/apache/tinkerpop/blob/TINKERPOP-2563-language/gremlin-language/src/main/antlr4/Gremlin.g4> ANTLR grammar into an abstract specification of Gremlin traversal syntax using the Dragon (YAML-based) format. Unfortunately, it is looking very unlikely that Dragon will become available as open-source software, so you can expect this YAML format to change just slightly once we have a new Dragon-like tool for schema and data transformations. More on that later. Right now, the syntax specification can be found here <https://github.com/apache/tinkerpop/tree/TINKERPOP-2563-language/gremlin-language/src/main/yaml/org/apache/tinkerpop/gremlin/language/model>, although the file path might change in the future. - *Traversal DTOs*. Based on the abstract specification, I have generated Java classes for building and working with traversals. The generated files can currently be found here <https://github.com/apache/tinkerpop/tree/TINKERPOP-2563-language/gremlin-language/src/gen/java/org/apache/tinkerpop/gremlin/language/model>. These are essentially POJOs or DTO classes, with special boilerplate methods for equality, pattern matching over alternative constructors, and modification by copying (since the instances are immutable). These classes allow you to build traversals in a declarative way, while all of the logic for evaluating traversals goes elsewhere. Support for serialization and deserialization for traversals is to be added in the future -- and the same goes for all other classes generated in this way. - *RDF 1.1 concepts model*. RDF support was part of TinkerPop from the beginning, but it was de-emphasized for TinkerPop 3 due to other priorities such as OLAP. For years, developers have been asking us for better interoperability with RDF. While we do have some query-level support for RDF these days in sparql-gremlin, we no longer have any data-level support, e.g. supporting loading RDF data into a property graph and getting it back out, evaluating Gremlin traversals over RDF datasets, etc. These things are not especially hard to do, in certain limited ways, but our old approach of writing adapters like GraphSail <https://github.com/tinkerpop/blueprints/wiki/Sail-Ouplementation>, SailGraph <https://github.com/tinkerpop/blueprints/wiki/Sail-Implementation>, and PropertyGraphSail <https://github.com/tinkerpop/blueprints/wiki/PropertyGraphSail-Ouplementation> in Java, with no support for other languages, does not seem appropriate for TinkerPop 4. Also, those early mappings were extremely underspecified in a formal sense -- good enough for some practical applications, but not good enough for anything requiring inference, optimization, or composition with other mappings. To that end, I am starting to add abstract specifications for RDF along the lines of the Gremlin specifications I described above. The first of these, a specification of RDF 1.1 Concepts, can currently be found here <https://github.com/apache/tinkerpop/blob/TINKERPOP-2563-language/gremlin-language/src/main/yaml/org/apache/tinkerpop/rdf/rdf11concepts.yaml>, with generated Java classes here <https://github.com/apache/tinkerpop/tree/TINKERPOP-2563-language/gremlin-language/src/gen/java/org/apache/tinkerpop/rdf/rdf11concepts>. This gives us a way of working with RDF data in a language-neutral and framework-neutral way (whereas we were previously tied to Java and to the RDF4j, nee Sesame, API). Mappings into and out of RDF will be defined with respect to these abstract types, which can easily be adapted to native RDF APIs in whatever language you happen to be working in. I will write more about the above topics as time goes by and I continue adding code to the branch. Happy to answer any questions or discuss any feedback in the meantime. Josh