Awesome, thanks Marko!

Good point, I'll try to explain my reasoning behind not using Sqlg (even
though it seems like a great project on its own). I'd be happy to receive
any feedback on it.

First, a bit about the motivation behind Unipop...
Unipop is meant to be a DAL on top of any databases of your choice. The
philosophy being that these days many organizations (as the one I work for)
have alot of different "kinds" of data, spread throughout many specialized
data stores (RDBMS / DocumentStore, etc.)

What we wanted to make is a DAL that'll enable us to query all our
different data stores and hundreds of different schemas, including the
relationships between the data, in one simple interface.

There are some projects that try to do the same thing (Drill
<http://drill.apache.org/>, Calcite <https://calcite.incubator.apache.org/>,
Dremel <http://research.google.com/pubs/pub36632.html>), but they use sql
as the "unified" query language. We figured that in a schema with many
connections, a property-graph representation would be better than a
relational model (trying to avoid "JOIN hell"). So we decided to implement
a Calcite-like application using gremlin - Unipop.

On the issue of using Sqlg, There were a few design decisions we made in
Unipop that seemed to go against it:

   1. The graph Ontology should not be dependent on the underlying schemas.
   One could choose to represent a table in a database as a vertex, or as a
   vertex + edge (represented by some FK column). You might even choose to
   make a "virtual" vertex (let's say an 'email-address' vertex) that isn't
   represented anywhere physically, but is used as a connection-point between
   other vertices in our ontology (e.g. the user's posts, stored each as a
   document in elasticsearch). Basically, we shouldn't bind the design of our
   "user-facing" ontology with the design of our optimized data store schemas.
   - OTOH, in Sqlg the schema is (understandably) mapped directly to the
      graph ontology <http://umlg.org/sqlg.html> (take a look at the
      Architecture section.)
      2. We must be able to query multiple different data stores in the
   same traversal, and even in the same step. Practically that meant that
   instead of implementing the process package (Steps, Strategies, etc.) for
   each data-store, we made one implementation that coordinates the different
   Controllers (elastic, jdbc, etc).
      - Before starting the work on the jdbc package I scanned through the
      sqlg code, and (again, understandably) the code seemed heavily
dependent on
      the process package.
   3. Translating gremlin's in/out steps to JOIN statments is a big pain.
   It's probably the hardest part about creating an sql implementation. We
   figured that for Unipop we'd just bypass that problem, create the JOINs we
   needed as views in the DB, and simply map those views to the vertices&edges
   to which they correspond in the graph ontology. (This explanation might not
   be too clear, I can expand on it if anyone's interested).


The reason for going into these details is because I'd be happy to get a
second opinion from you guys, about using Sqlg in particular, and about the
design decisions in general.

BTW, the same points are probably relevant in regards to using Titan's
Cassandra/Hbase/etc connectors.

Thanks,
Ran

On Tue, 27 Oct 2015 at 16:58 Marko Rodriguez <[email protected]> wrote:

> Hi Ran,
>
> I just submitted a PR to your Unipop project.
>
>         https://github.com/rmagen/unipop/pull/3
>
> However, while cruising around, I notice your unipop-jdbc/ package. Why
> not just use Pieter Martin's Sqlg project for JDBC/TinkerPop?
>
>         https://github.com/pietermartin/sqlg
>
> Perhaps I don't understand the purpose of your package… just a random
> thought.
>
> Thanks,
> Marko.
>
> http://markorodriguez.com
>
>

Reply via email to