Hi all, I caught sight of this thread a while back but haven't had much
time for e-mail...
As an implementer, experientially I found implementing the API to be rather
difficult / unclear. Below is a random list of some of the gripes that I
had. Dunno if other implementers have the same gripes but for what it's
worth you can take this as a data point. In general, though, I felt that a
developer walk-though-guide of the API would have been very helpful.
Instead, my journey consisted mostly of checking the (non-transactional)
TinkerGraph reference implementation, asking questions on this mailing
list, and sometimes just trying something and seeing if the unit tests
passed >_<
1) Features in the API are all defaulted to "true". So as someone starting
off implementing TinkerPop, I need to do this massive implementation of all
the features interfaces and set almost everything to false. Feels like it
creates a mess of code from the very start. As an implementer it would be
much easier for me if there were like a "minimal feature set reference"
that I could use to just understand what, at the very minimum, I need to do
to get something useful working. Then the documentation should guide me
through the rest of the features that may be of interest to me.
2) Documentation can be very confusing. An example is the documentation for
supportNumericIds and supportStringIds. They each say that the feature
being true means the internal ID representation is a number / string. But
why do features say anything in the first place about implementation
details? As an implementer I should be free to choose the internal
representation. The documentation goes on to say "In other words, if the
value returned from {@link Element#id()} is a numeric value then this
method should return {@code true}." But the weird thing is that both of
these features have a default value of true for supportNumericIds and
supportStringIds (and so does the TinkerGraph reference implementation)....
so how is it possible that Element.id() returns a value that is both a
Number and a String at the same time? It takes a while to figure out that
what's really going on here is that what the documentation is *really*
trying to say is "supportNumericIds == true && supportsUserSuppliedIds ==
true means that 1) the graph supports creating elements with user supplied
IDs of type Number, 2) You can use Number type IDs to read elements, and 3)
If you created an element with a Number type ID, then those elements will
always return that same Number type ID when you call Element.id(). If, on
the other hand, supportNumericIds == true && supportsUserSuppliedIds ==
false, then this means that for all elements in the graph, Element.id()
returns a Number type ID and you can only use Number type IDs to read
elements from the graph."
3) Some things in the API were painful for me specifically. The data store
that I'm using is a key-value store that supports transactions. I was not
able to implement the Graph.vertices() method correctly, because to do so
in a transaction means reading all the vertices out of the graph
consistently, which I can't do unless I make one of two highly undesirable
tradeoffs: 1) create a data structure containing a list of all vertices in
the graph and keep it consistent (extremely slow), or 2) create a global
vertex lock that must be acquired before reading/writing any
vertex/vertices (extremely slow). If I do not allow user supplied IDs, the
job gets a little bit simpler because I can keep a counter representing the
high water mark for IDs, and I can read that consistently, and then try to
read every ID in the range between 1 and the high water mark. But since I
need to allow user supplied IDs in my graph for running benchmarks, I
unfortunately don't have this option.
4) Some important features are absent from the API. For social networks,
it's often very important to be able to keep edges ordered by a particular
property key (say "creationDate") and read edges in a particular
time-window (say between now and 7 days ago) or with a particular limit
(say most recent edges up to 10k edges). TinkerPop currently doesn't seem
to support those features. Instead the application has to first read *all*
the edges from a vertex and filter them. If the number of edges is very
large and the storage of those edges is remote to the client, this can be a
very costly operation.
5) No standard interfaces for schemas and indexes. I admit, though, this is
probably a really hard interface to get right for everybody.
6) Unit tests create a new graph for every test, which makes the unit tests
take a *very* long time if creating a new graph has to make remote
connections in the case of a distributed graph database. What's really
annoying is that this is true even the feature being tested is not
implemented. A new graph has to be created before the check can be made for
that feature. This means that for developers just starting their
distributed graph database implementation with nearly zero features
implemented, the unit tests can take a very long time to run.
7) Gremlin, due to its imperative instead of declarative nature, can be
very hard to get right for complex queries. When you need to do things
like, given a source node, traverse that node's friends,
friends-of-friends, and friends-of-friends-of-friends, and search for
friends that match a particular criteria and return the top N matching
friends in the order of distance from the source... in Gremlin this quickly
becomes a very complex pipeline of carefully placed barriers, filters,
stores, folds, unfolds, and de-duplications. See here
<https://github.com/PlatformLab/ldbc-snb-impls/blob/master/snb-interactive-torc/src/main/java/net/ellitron/ldbcsnbimpls/interactive/torc/TorcDb.java#L241>
for
an example of this exact thing.
On Thu, May 26, 2016 at 4:14 AM Stephen Mallette <[email protected]>
wrote:
> > People who just click links and copy paste code snippets w/o being aware
> of 2.x/3.x sort of stuff
>
> that is a fascinating phenomena, isn't it.
>
> On Thu, May 26, 2016 at 7:00 AM, Marko Rodriguez <[email protected]>
> wrote:
>
> > > You know what else i just noticed?
> > >
> > > http://gremlin.tinkerpop.com
> > > http://rexster.tinkerpop.com
> > > .... and so on
> > >
> > > all still point to the old wikis - should we consider changing those to
> > > have them all just go to the 3.x home page?
> >
> > I believe there are more people on 2.x than there are on 3.x. Thus, while
> > we think 2.x is basically MS-DOS, we shouldn't hinder others still
> relying
> > on those links. Over time things will wash out and also, if you are
> serious
> > about TinkerPop, you will know whats going on. People who just click
> links
> > and copy paste code snippets w/o being aware of 2.x/3.x sort of stuff
> > aren't worth the trouble we will cause those who legitimately still use
> 2.x
> > and the respective docs.
> >
> > Marko.
> >
> >
> >
> >
> > > On Wed, May 25, 2016 at 1:50 PM, Stephen Mallette <
> [email protected]>
> > > wrote:
> > >
> > >>> but maybe perhaps Stephen would like to get a talk like that at
> > >> Cassandra Summit?
> > >>
> > >> hehe - i'm sure you could have found a way to make that topic
> > interesting.
> > >> i'm not so sure i could. :)
> > >>
> > >>> Psssht, the original Uni_pop _has Tinkerpop support, and a better
> > >> unicorn logo...
> > >>
> > >> ha!
> > >>
> > >>> Seriously though, for wider Tinkerpop adoption it would be cool to
> > have
> > >> a
> > >> general "Provider Template" along with the tutorial/blogpost :
> > >>
> > >> the "provider template" could be an addition to the maven archetypes
> we
> > >> have then you just do:
> > >>
> > >> mvn archetype:generate -DarchetypeGroupId=org.apache.tinkerpop
> > >> -DarchetypeArtifactId=gremlin-archetype-provider
> > -DarchetypeVersion=3.2.1
> > >> -DgroupId=com.my-DartifactId=my-tinkerpop-implementation
> > >>
> > >> i'm not sure i want to volunteer to do that one, but that would be
> kinda
> > >> cool. the only downside is that it's a fair bit of trouble to
> maintain a
> > >> template/archetype for something that probably won't see a ton of use
> -
> > >> unless there are suddenly hundreds of tinkerpop implementations :)
> > >>
> > >>
> > >>
> > >> On Wed, May 25, 2016 at 12:39 PM, Marko Rodriguez <
> [email protected]
> > >
> > >> wrote:
> > >>
> > >>> More stuff ---
> > >>>
> > >>> One thing I think that we really need to drive home to providers is
> > >>> TraversalStrategies. That should be a blog post too. I've talked to
> two
> > >>> graph databases providers recently and both were concerned about
> > >>> performance through the TinkerPop API. They didn't know they could
> > write
> > >>> provider-specific strategies to bypass TinkerPop and talk directly to
> > their
> > >>> databases native APIs/optimizations. Once that was clear, both were
> > like
> > >>> "ahhhhhhh…."
> > >>>
> > >>> Marko.
> > >>>
> > >>> http://markorodriguez.com
> > >>>
> > >>> On May 25, 2016, at 9:18 AM, Ran Magen <[email protected]> wrote:
> > >>>
> > >>>> Psssht, the original Uni_pop _has Tinkerpop support, and a better
> > >>> unicorn
> > >>>> logo...
> > >>>>
> > >>>>
> > >>>>
> > >>>> Seriously though, for wider Tinkerpop adoption it would be cool to
> > >>> have a
> > >>>> general "Provider Template" along with the tutorial/blogpost :
> > >>>>
> > >>>> * Default `structure` implementation, with /*IMPLEMENT
> READ/WRITE/ETC
> > >>> HERE*/ in the relevant places.
> > >>>>
> > >>>> * Default `process` implemantions (i.e. `TraversalStrategy`s). This
> > >>> should probably be "commented out" at first, and "uncommented" after
> > the
> > >>> basic structure implementation is working.
> > >>>> * Default setup of test suites.
> > >>>> * Configurations
> > >>>> * pom.xml
> > >>>> * Gremlin Console plugin
> > >>>> * Utility scripts (e.g. deploy&run in console/server)
> > >>>>
> > >>>> On May 25 2016, at 5:36 pm, Jason Plurad <[email protected]>
> > >>> wrote:
> > >>>>
> > >>>>> Agreed. A big on-going problem TinkerPop has is that people
> > invariably
> > >>>> stumble upon TinkerPop 2 and Blueprints/Pipes. If they find TP2,
> maybe
> > >>> they
> > >>>> presume it is dead, so they roll their own.
> > >>>>
> > >>>>>
> > >>>>
> > >>>>> I've been tinkering recently in this space, more specifically to
> > better
> > >>>> understand the gremlin-test suite in general. A blog post sounds
> like
> > a
> > >>>> good idea. I can take a stab at it.
> > >>>>
> > >>>>>
> > >>>>
> > >>>>> On Wed, May 25, 2016 at 10:25 AM, Dylan Millikin
> > >>>> <[email protected]>
> > >>>> wrote:
> > >>>>
> > >>>>>
> > >>>>
> > >>>>> > Maybe working on referencing these pages via perhaps a blog
> post
> > >>> from
> > >>>> > someone would be cool. Something along the lines of "Creating a
> > >>> graph db
> > >>>> > with Tinkerpop" or some other variation that may get good hit
> > >>> results in
> > >>>> a
> > >>>> > google search.
> > >>>> >
> > >>>> > On Wed, May 25, 2016 at 10:06 AM, Stephen Mallette
> > >>>> <[email protected]>
> > >>>> > wrote:
> > >>>> >
> > >>>> > > We've seen a lot of new graphs come out that don't do
> > >>> TinkerPop from
> > >>>> the
> > >>>> > > start. Perhaps they make a conscious decision not to - i
> > >>> dunno. I
> > >>>> just
> > >>>> > > wonder if part of the problem is the provider docs for
> doing
> > >>> an
> > >>>> > > implementation:
> > >>>> > >
> > >>>> > > <
> > >>> http://tinkerpop.apache.org/docs/3.2.0-incubating/dev/provider/>
> > >>>> > >
> > >>>> > > are they easy enough to find? do folks understand them and
> > >>> what it
> > >>>> means
> > >>>> > to
> > >>>> > > be tinkerpop-enabled? the docs could probably be improved
> -
> > >>> any
> > >>>> graph
> > >>>> > > providers out there want to take a stab at it? in some
> ways
> > >>> your
> > >>>> external
> > >>>> > > experience at implementing might be helpful in improving
> > them.
> > >>>> > >
> > >>>> > > On Tue, May 24, 2016 at 12:40 PM, Marko Rodriguez
> > >>>> <[email protected]>
> > >>>> > > wrote:
> > >>>> > >
> > >>>> > > > Hi,
> > >>>> > > >
> > >>>> > > > See https://github.com/haifengl/unicorn
> > >>>> > > >
> > >>>> > > > They say they support a "Gremlin-like API." It would
> be
> > >>> really
> > >>>> cool if
> > >>>> > > > they just implemented TinkerPop's Graph API. Perhaps
> > >>> someone
> > >>>> feels like
> > >>>> > > > creating a ticket at their main repo explaining how
> to
> > >>> go about
> > >>>> > > supporting
> > >>>> > > > TinkerPop? Or, even better, providing them a PR!
> > >>>> > > >
> > >>>> > > > https://github.com/adplabs/unicorn
> > >>>> > > >
> > >>>> > > > Take care,
> > >>>> > > > Marko.
> > >>>> > > >
> > >>>> > > > <http://markorodriguez.com>
> > >>>> > > >
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> > >>>
> > >>>
> > >>
> >
> >
>