Jonathan, thanks for the detailed feedback. In general, though, I felt that a > developer walk-though-guide of the API would have been very helpful. >
It would be nice if our Provider Documentation read more that way: http://tinkerpop.apache.org/docs/3.2.0-incubating/dev/provider/ It would be great to see some of the providers who have had struggles/successes organize a bit on this list and come up with some specific changes to offer. > 1) Features in the API are all defaulted to "true". So as someone starting > off implementing TinkerPop, I need to do this massive implementation of all > the features interfaces and set almost everything to false. Feels like it > creates a mess of code from the very start. > I think the recommendation would be to set everything to false and incrementally add trues as you implement certain features. I'd agree that trying to just implement it all at once would be quite difficult. > 2) Documentation can be very confusing. An example is the documentation for > supportNumericIds and supportStringIds. They each say that the feature > being true means the internal ID representation is a number / string. But > why do features say anything in the first place about implementation > details? As an implementer I should be free to choose the internal > representation. The documentation goes on to say "In other words, if the > value returned from {@link Element#id()} is a numeric value then this > method should return {@code true}." But the weird thing is that both of > these features have a default value of true for supportNumericIds and > supportStringIds (and so does the TinkerGraph reference implementation).... > so how is it possible that Element.id() returns a value that is both a > Number and a String at the same time? It takes a while to figure out that > what's really going on here is that what the documentation is *really* > trying to say is "supportNumericIds == true && supportsUserSuppliedIds == > true means that 1) the graph supports creating elements with user supplied > IDs of type Number, 2) You can use Number type IDs to read elements, and 3) > If you created an element with a Number type ID, then those elements will > always return that same Number type ID when you call Element.id(). If, on > the other hand, supportNumericIds == true && supportsUserSuppliedIds == > false, then this means that for all elements in the graph, Element.id() > returns a Number type ID and you can only use Number type IDs to read > elements from the graph." > agreed - that part is confusing. You seem to have it straight in your head how that works given your explanation. Could you submit a pull request to help clarify the javadoc in that regard? > 4) Some important features are absent from the API. For social networks, > it's often very important to be able to keep edges ordered by a particular > property key (say "creationDate") and read edges in a particular > time-window (say between now and 7 days ago) or with a particular limit > (say most recent edges up to 10k edges). TinkerPop currently doesn't seem > to support those features. Instead the application has to first read *all* > the edges from a vertex and filter them. If the number of edges is very > large and the storage of those edges is remote to the client, this can be a > very costly operation. > I don't see what you describe as being "absent". The use cases you describe are present in Gremlin right? g.V().outE().has('createdDate", gt(sevenDaysAgo)).inV() If you have a way to improve the speed of that query (like Titan does for instance with vertex centric indices), then you just need to write a TraversalStrategy for your graph implementation that inspects that Gremlin on execution and recompiles it to a more efficient form for execution against your graph. TraversalStrategies are the method by which you can really showcase the power of your graph implementation as compared to others. > 5) No standard interfaces for schemas and indexes. I admit, though, this is > probably a really hard interface to get right for everybody. > The indices experiment in TinkerPop 2.x wasn't good - I don't think we'll look to bring that back. As it stands index management schemes for different graphs have become even more disparate and complex than before in recent years. Again, if you have indices the should be exploited via TraversalStrategies though. To some degree, i'm interested in a good proposal for schema APIs though I don't know what I'm looking for to say if it's good or not. > 6) Unit tests create a new graph for every test, which makes the unit tests > take a *very* long time if creating a new graph has to make remote > connections in the case of a distributed graph database. What's really > annoying is that this is true even the feature being tested is not > implemented. A new graph has to be created before the check can be made for > that feature. This means that for developers just starting their > distributed graph database implementation with nearly zero features > implemented, the unit tests can take a very long time to run. > I can't remember who I discussed that with before - might have been you. There's really no good way out of that because we need a Graph instance to detect features and we can't have a Graph instance without creating one. If someone has a way around that it would be nice because it would certainly speed things up. > 7) Gremlin, due to its imperative instead of declarative nature, It's a shame that we can't seem to get more folks to realize that Gremlin is not just an imperative language. It has declarative aspects too: http://tinkerpop.apache.org/docs/current/reference/#match-step > can be > very hard to get right for complex queries. When you need to do things > like, given a source node, traverse that node's friends, > friends-of-friends, and friends-of-friends-of-friends, and search for > friends that match a particular criteria and return the top N matching > friends in the order of distance from the source... in Gremlin this quickly > becomes a very complex pipeline of carefully placed barriers, filters, > stores, folds, unfolds, and de-duplications. See here > < > https://github.com/PlatformLab/ldbc-snb-impls/blob/master/snb-interactive-torc/src/main/java/net/ellitron/ldbcsnbimpls/interactive/torc/TorcDb.java#L241 > > > for > an example of this exact thing. > I didn't have time to dig into that traversal, but it looks overly complex for what the comments say it is supposed to be doing. Maybe I (Kuppitz will probably beat me to it) can find some time to take up the challenge to make that nicer. I'm not so sure you even need match() step to deal with those traversal requirements. On Sat, Jun 4, 2016 at 9:41 PM, Jonathan Ellithorpe <j...@cs.stanford.edu> wrote: > Hi all, I caught sight of this thread a while back but haven't had much > time for e-mail... > > As an implementer, experientially I found implementing the API to be rather > difficult / unclear. Below is a random list of some of the gripes that I > had. Dunno if other implementers have the same gripes but for what it's > worth you can take this as a data point. In general, though, I felt that a > developer walk-though-guide of the API would have been very helpful. > Instead, my journey consisted mostly of checking the (non-transactional) > TinkerGraph reference implementation, asking questions on this mailing > list, and sometimes just trying something and seeing if the unit tests > passed >_< > > 1) Features in the API are all defaulted to "true". So as someone starting > off implementing TinkerPop, I need to do this massive implementation of all > the features interfaces and set almost everything to false. Feels like it > creates a mess of code from the very start. As an implementer it would be > much easier for me if there were like a "minimal feature set reference" > that I could use to just understand what, at the very minimum, I need to do > to get something useful working. Then the documentation should guide me > through the rest of the features that may be of interest to me. > > 2) Documentation can be very confusing. An example is the documentation for > supportNumericIds and supportStringIds. They each say that the feature > being true means the internal ID representation is a number / string. But > why do features say anything in the first place about implementation > details? As an implementer I should be free to choose the internal > representation. The documentation goes on to say "In other words, if the > value returned from {@link Element#id()} is a numeric value then this > method should return {@code true}." But the weird thing is that both of > these features have a default value of true for supportNumericIds and > supportStringIds (and so does the TinkerGraph reference implementation).... > so how is it possible that Element.id() returns a value that is both a > Number and a String at the same time? It takes a while to figure out that > what's really going on here is that what the documentation is *really* > trying to say is "supportNumericIds == true && supportsUserSuppliedIds == > true means that 1) the graph supports creating elements with user supplied > IDs of type Number, 2) You can use Number type IDs to read elements, and 3) > If you created an element with a Number type ID, then those elements will > always return that same Number type ID when you call Element.id(). If, on > the other hand, supportNumericIds == true && supportsUserSuppliedIds == > false, then this means that for all elements in the graph, Element.id() > returns a Number type ID and you can only use Number type IDs to read > elements from the graph." > > 3) Some things in the API were painful for me specifically. The data store > that I'm using is a key-value store that supports transactions. I was not > able to implement the Graph.vertices() method correctly, because to do so > in a transaction means reading all the vertices out of the graph > consistently, which I can't do unless I make one of two highly undesirable > tradeoffs: 1) create a data structure containing a list of all vertices in > the graph and keep it consistent (extremely slow), or 2) create a global > vertex lock that must be acquired before reading/writing any > vertex/vertices (extremely slow). If I do not allow user supplied IDs, the > job gets a little bit simpler because I can keep a counter representing the > high water mark for IDs, and I can read that consistently, and then try to > read every ID in the range between 1 and the high water mark. But since I > need to allow user supplied IDs in my graph for running benchmarks, I > unfortunately don't have this option. > > 4) Some important features are absent from the API. For social networks, > it's often very important to be able to keep edges ordered by a particular > property key (say "creationDate") and read edges in a particular > time-window (say between now and 7 days ago) or with a particular limit > (say most recent edges up to 10k edges). TinkerPop currently doesn't seem > to support those features. Instead the application has to first read *all* > the edges from a vertex and filter them. If the number of edges is very > large and the storage of those edges is remote to the client, this can be a > very costly operation. > > 5) No standard interfaces for schemas and indexes. I admit, though, this is > probably a really hard interface to get right for everybody. > > 6) Unit tests create a new graph for every test, which makes the unit tests > take a *very* long time if creating a new graph has to make remote > connections in the case of a distributed graph database. What's really > annoying is that this is true even the feature being tested is not > implemented. A new graph has to be created before the check can be made for > that feature. This means that for developers just starting their > distributed graph database implementation with nearly zero features > implemented, the unit tests can take a very long time to run. > > 7) Gremlin, due to its imperative instead of declarative nature, can be > very hard to get right for complex queries. When you need to do things > like, given a source node, traverse that node's friends, > friends-of-friends, and friends-of-friends-of-friends, and search for > friends that match a particular criteria and return the top N matching > friends in the order of distance from the source... in Gremlin this quickly > becomes a very complex pipeline of carefully placed barriers, filters, > stores, folds, unfolds, and de-duplications. See here > < > https://github.com/PlatformLab/ldbc-snb-impls/blob/master/snb-interactive-torc/src/main/java/net/ellitron/ldbcsnbimpls/interactive/torc/TorcDb.java#L241 > > > for > an example of this exact thing. > > > > On Thu, May 26, 2016 at 4:14 AM Stephen Mallette <spmalle...@gmail.com> > wrote: > > > > People who just click links and copy paste code snippets w/o being > aware > > of 2.x/3.x sort of stuff > > > > that is a fascinating phenomena, isn't it. > > > > On Thu, May 26, 2016 at 7:00 AM, Marko Rodriguez <okramma...@gmail.com> > > wrote: > > > > > > You know what else i just noticed? > > > > > > > > http://gremlin.tinkerpop.com > > > > http://rexster.tinkerpop.com > > > > .... and so on > > > > > > > > all still point to the old wikis - should we consider changing those > to > > > > have them all just go to the 3.x home page? > > > > > > I believe there are more people on 2.x than there are on 3.x. Thus, > while > > > we think 2.x is basically MS-DOS, we shouldn't hinder others still > > relying > > > on those links. Over time things will wash out and also, if you are > > serious > > > about TinkerPop, you will know whats going on. People who just click > > links > > > and copy paste code snippets w/o being aware of 2.x/3.x sort of stuff > > > aren't worth the trouble we will cause those who legitimately still use > > 2.x > > > and the respective docs. > > > > > > Marko. > > > > > > > > > > > > > > > > On Wed, May 25, 2016 at 1:50 PM, Stephen Mallette < > > spmalle...@gmail.com> > > > > wrote: > > > > > > > >>> but maybe perhaps Stephen would like to get a talk like that at > > > >> Cassandra Summit? > > > >> > > > >> hehe - i'm sure you could have found a way to make that topic > > > interesting. > > > >> i'm not so sure i could. :) > > > >> > > > >>> Psssht, the original Uni_pop _has Tinkerpop support, and a better > > > >> unicorn logo... > > > >> > > > >> ha! > > > >> > > > >>> Seriously though, for wider Tinkerpop adoption it would be cool to > > > have > > > >> a > > > >> general "Provider Template" along with the tutorial/blogpost : > > > >> > > > >> the "provider template" could be an addition to the maven archetypes > > we > > > >> have then you just do: > > > >> > > > >> mvn archetype:generate -DarchetypeGroupId=org.apache.tinkerpop > > > >> -DarchetypeArtifactId=gremlin-archetype-provider > > > -DarchetypeVersion=3.2.1 > > > >> -DgroupId=com.my-DartifactId=my-tinkerpop-implementation > > > >> > > > >> i'm not sure i want to volunteer to do that one, but that would be > > kinda > > > >> cool. the only downside is that it's a fair bit of trouble to > > maintain a > > > >> template/archetype for something that probably won't see a ton of > use > > - > > > >> unless there are suddenly hundreds of tinkerpop implementations :) > > > >> > > > >> > > > >> > > > >> On Wed, May 25, 2016 at 12:39 PM, Marko Rodriguez < > > okramma...@gmail.com > > > > > > > >> wrote: > > > >> > > > >>> More stuff --- > > > >>> > > > >>> One thing I think that we really need to drive home to providers is > > > >>> TraversalStrategies. That should be a blog post too. I've talked to > > two > > > >>> graph databases providers recently and both were concerned about > > > >>> performance through the TinkerPop API. They didn't know they could > > > write > > > >>> provider-specific strategies to bypass TinkerPop and talk directly > to > > > their > > > >>> databases native APIs/optimizations. Once that was clear, both were > > > like > > > >>> "ahhhhhhh…." > > > >>> > > > >>> Marko. > > > >>> > > > >>> http://markorodriguez.com > > > >>> > > > >>> On May 25, 2016, at 9:18 AM, Ran Magen <rma...@gmail.com> wrote: > > > >>> > > > >>>> Psssht, the original Uni_pop _has Tinkerpop support, and a better > > > >>> unicorn > > > >>>> logo... > > > >>>> > > > >>>> > > > >>>> > > > >>>> Seriously though, for wider Tinkerpop adoption it would be cool > to > > > >>> have a > > > >>>> general "Provider Template" along with the tutorial/blogpost : > > > >>>> > > > >>>> * Default `structure` implementation, with /*IMPLEMENT > > READ/WRITE/ETC > > > >>> HERE*/ in the relevant places. > > > >>>> > > > >>>> * Default `process` implemantions (i.e. `TraversalStrategy`s). > This > > > >>> should probably be "commented out" at first, and "uncommented" > after > > > the > > > >>> basic structure implementation is working. > > > >>>> * Default setup of test suites. > > > >>>> * Configurations > > > >>>> * pom.xml > > > >>>> * Gremlin Console plugin > > > >>>> * Utility scripts (e.g. deploy&run in console/server) > > > >>>> > > > >>>> On May 25 2016, at 5:36 pm, Jason Plurad <plur...@gmail.com > > > > > >>> wrote: > > > >>>> > > > >>>>> Agreed. A big on-going problem TinkerPop has is that people > > > invariably > > > >>>> stumble upon TinkerPop 2 and Blueprints/Pipes. If they find TP2, > > maybe > > > >>> they > > > >>>> presume it is dead, so they roll their own. > > > >>>> > > > >>>>> > > > >>>> > > > >>>>> I've been tinkering recently in this space, more specifically to > > > better > > > >>>> understand the gremlin-test suite in general. A blog post sounds > > like > > > a > > > >>>> good idea. I can take a stab at it. > > > >>>> > > > >>>>> > > > >>>> > > > >>>>> On Wed, May 25, 2016 at 10:25 AM, Dylan Millikin > > > >>>> <dylan.milli...@gmail.com> > > > >>>> wrote: > > > >>>> > > > >>>>> > > > >>>> > > > >>>>> > Maybe working on referencing these pages via perhaps a blog > > post > > > >>> from > > > >>>> > someone would be cool. Something along the lines of > "Creating a > > > >>> graph db > > > >>>> > with Tinkerpop" or some other variation that may get good hit > > > >>> results in > > > >>>> a > > > >>>> > google search. > > > >>>> > > > > >>>> > On Wed, May 25, 2016 at 10:06 AM, Stephen Mallette > > > >>>> <spmalle...@gmail.com> > > > >>>> > wrote: > > > >>>> > > > > >>>> > > We've seen a lot of new graphs come out that don't do > > > >>> TinkerPop from > > > >>>> the > > > >>>> > > start. Perhaps they make a conscious decision not to - i > > > >>> dunno. I > > > >>>> just > > > >>>> > > wonder if part of the problem is the provider docs for > > doing > > > >>> an > > > >>>> > > implementation: > > > >>>> > > > > > >>>> > > < > > > >>> http://tinkerpop.apache.org/docs/3.2.0-incubating/dev/provider/> > > > >>>> > > > > > >>>> > > are they easy enough to find? do folks understand them > and > > > >>> what it > > > >>>> means > > > >>>> > to > > > >>>> > > be tinkerpop-enabled? the docs could probably be > improved > > - > > > >>> any > > > >>>> graph > > > >>>> > > providers out there want to take a stab at it? in some > > ways > > > >>> your > > > >>>> external > > > >>>> > > experience at implementing might be helpful in improving > > > them. > > > >>>> > > > > > >>>> > > On Tue, May 24, 2016 at 12:40 PM, Marko Rodriguez > > > >>>> <okramma...@gmail.com> > > > >>>> > > wrote: > > > >>>> > > > > > >>>> > > > Hi, > > > >>>> > > > > > > >>>> > > > See https://github.com/haifengl/unicorn > > > >>>> > > > > > > >>>> > > > They say they support a "Gremlin-like API." It > would > > be > > > >>> really > > > >>>> cool if > > > >>>> > > > they just implemented TinkerPop's Graph API. > Perhaps > > > >>> someone > > > >>>> feels like > > > >>>> > > > creating a ticket at their main repo explaining how > > to > > > >>> go about > > > >>>> > > supporting > > > >>>> > > > TinkerPop? Or, even better, providing them a PR! > > > >>>> > > > > > > >>>> > > > https://github.com/adplabs/unicorn > > > >>>> > > > > > > >>>> > > > Take care, > > > >>>> > > > Marko. > > > >>>> > > > > > > >>>> > > > <http://markorodriguez.com> > > > >>>> > > > > > > >>>> > > > > > > >>>> > > > > > >>>> > > > > >>>> > > > >>> > > > >>> > > > >> > > > > > > > > >