Awesome, thanks Marko! Good point, I'll try to explain my reasoning behind not using Sqlg (even though it seems like a great project on its own). I'd be happy to receive any feedback on it.
First, a bit about the motivation behind Unipop... Unipop is meant to be a DAL on top of any databases of your choice. The philosophy being that these days many organizations (as the one I work for) have alot of different "kinds" of data, spread throughout many specialized data stores (RDBMS / DocumentStore, etc.) What we wanted to make is a DAL that'll enable us to query all our different data stores and hundreds of different schemas, including the relationships between the data, in one simple interface. There are some projects that try to do the same thing (Drill <http://drill.apache.org/>, Calcite <https://calcite.incubator.apache.org/>, Dremel <http://research.google.com/pubs/pub36632.html>), but they use sql as the "unified" query language. We figured that in a schema with many connections, a property-graph representation would be better than a relational model (trying to avoid "JOIN hell"). So we decided to implement a Calcite-like application using gremlin - Unipop. On the issue of using Sqlg, There were a few design decisions we made in Unipop that seemed to go against it: 1. The graph Ontology should not be dependent on the underlying schemas. One could choose to represent a table in a database as a vertex, or as a vertex + edge (represented by some FK column). You might even choose to make a "virtual" vertex (let's say an 'email-address' vertex) that isn't represented anywhere physically, but is used as a connection-point between other vertices in our ontology (e.g. the user's posts, stored each as a document in elasticsearch). Basically, we shouldn't bind the design of our "user-facing" ontology with the design of our optimized data store schemas. - OTOH, in Sqlg the schema is (understandably) mapped directly to the graph ontology <http://umlg.org/sqlg.html> (take a look at the Architecture section.) 2. We must be able to query multiple different data stores in the same traversal, and even in the same step. Practically that meant that instead of implementing the process package (Steps, Strategies, etc.) for each data-store, we made one implementation that coordinates the different Controllers (elastic, jdbc, etc). - Before starting the work on the jdbc package I scanned through the sqlg code, and (again, understandably) the code seemed heavily dependent on the process package. 3. Translating gremlin's in/out steps to JOIN statments is a big pain. It's probably the hardest part about creating an sql implementation. We figured that for Unipop we'd just bypass that problem, create the JOINs we needed as views in the DB, and simply map those views to the vertices&edges to which they correspond in the graph ontology. (This explanation might not be too clear, I can expand on it if anyone's interested). The reason for going into these details is because I'd be happy to get a second opinion from you guys, about using Sqlg in particular, and about the design decisions in general. BTW, the same points are probably relevant in regards to using Titan's Cassandra/Hbase/etc connectors. Thanks, Ran On Tue, 27 Oct 2015 at 16:58 Marko Rodriguez <[email protected]> wrote: > Hi Ran, > > I just submitted a PR to your Unipop project. > > https://github.com/rmagen/unipop/pull/3 > > However, while cruising around, I notice your unipop-jdbc/ package. Why > not just use Pieter Martin's Sqlg project for JDBC/TinkerPop? > > https://github.com/pietermartin/sqlg > > Perhaps I don't understand the purpose of your package⦠just a random > thought. > > Thanks, > Marko. > > http://markorodriguez.com > >
