Sorry for the delay here, been away for a while...
Thanks for clarifying things. That makes sense... it's a bit different from
the MySQL storage engine model, where one simply implements the API for a
new storage engine, plugs it in, and the "top-half" doesn't change (e.g.
the query planner / storage engine driver). Any changes that do need to be
made to the behavior of MySQL are communicated up via the API (e.g. to
disable certain kinds of locking in the top-half that are handled more
performantly by the storage engine itself).
One interesting benefit of this approach is that it offers the user storage
engine independence. An SQL query can "just execute" against a MySQL server
and be agnostic of what storage engine it's using underneath.
It *seems* like Gremlin doesn't really achieve this (maybe it wasn't a goal
of the project to begin with). For instance, to create an index, user code
needs to interface with the storage engine directly. Or to support queries
like the following:
g.V().outE().has('creationDate", gt(sevenDaysAgo)).inV()
with high performance, the user needs to tell the storage engine to keep
the edges sorted by the key 'creationDate'.
Maybe this is changing in Gremlin (or has already changed)?
Jonathan
On Tue, Jun 7, 2016 at 6:40 AM Marko Rodriguez <[email protected]> wrote:
> Hi,
>
> > 4) Some important features are absent from the API. For social networks,
> >> it's often very important to be able to keep edges ordered by a
> particular
> >> property key (say "creationDate") and read edges in a particular
> >> time-window (say between now and 7 days ago) or with a particular limit
> >> (say most recent edges up to 10k edges). TinkerPop currently doesn't
> seem
> >> to support those features. Instead the application has to first read
> *all*
> >> the edges from a vertex and filter them. If the number of edges is very
> >> large and the storage of those edges is remote to the client, this can
> be a
> >> very costly operation.
> >>
> >
> > I don't see what you describe as being "absent". The use cases you
> describe
> > are present in Gremlin right?
> >
> > g.V().outE().has('createdDate", gt(sevenDaysAgo)).inV()
> >
> > If you have a way to improve the speed of that query (like Titan does for
> > instance with vertex centric indices), then you just need to write a
> > TraversalStrategy for your graph implementation that inspects that
> Gremlin
> > on execution and recompiles it to a more efficient form for execution
> > against your graph. TraversalStrategies are the method by which you can
> > really showcase the power of your graph implementation as compared to
> > others.
>
>
>
>
> I would like to extend on this. Note that the raw
> Graph/Vertex/Edge/Property/etc. APIs are the bare minimum to get a Graph
> implementation being TInkerPop-enabled. These APIs are very Iterator-based
> and as such, don’t lend themselves to intelligent “push down
> predicate”-type of data access patterns. This is like this for a reason.
> These APIs are meant to be what any reasonable implementation can provide.
>
> However, graph providers will most definitely want to move beyond these
> simple methods to ensure that the features of their graph system are taken
> full advantage of. As Stephen notes, this is all done via
> TraversalStrategies. Users should NOT be interacting with the Java API, but
> via Traversals and TraversalStrategies are the compilations rules that can
> allow a provider to do stuff like:
>
> 1. Index lookups.
> - g.V().has(“name”,”marko”) // index call to get the
> name=marko vertex
> 2. Push down predicates for intelligent selection of data subsets.
> - g.V().hasLabel(“person”).out(“knows”).hasLabel(“person”)
> // only access the person-partition of the graph
> 3. Grouping steps into “higher order” steps that do more with less
> calls.
> - g.V().outE().has(“timestamp”,gt(2010)).inV() // do
> outE().has().inV() as one mega-step that grabs only the needed data
> 4. …
>
> Each provider will have their own set of Strategies and custom steps that
> compile Gremlin (beyond TinkerPop’s optimizers) to the most efficient
> representation given their system’s unique capabilities.
>
> Finally, you can see how traversal strategies/compilation works with
> .explain().
>
> gremlin>
> g.V().has("name","marko").outE("knows").identity().inV().where(count().is(gt(10))).groupCount().by(both().count()).explain()
> ==>Traversal Explanation
>
> ===============================================================================================================================================================================================================================================================================================
> Original Traversal [GraphStep(vertex,[]),
> HasStep([name.eq(marko)]), VertexStep(OUT,[knows],edge), IdentityStep,
> EdgeVertexStep(IN), TraversalFilterStep([CountGlobalStep, IsStep(gt(10))]),
> GroupCountStep([VertexStep(BOTH,vertex), CountGlobalStep])]
>
> ConnectiveStrategy [D] [GraphStep(vertex,[]),
> HasStep([name.eq(marko)]), VertexStep(OUT,[knows],edge), IdentityStep,
> EdgeVertexStep(IN), TraversalFilterStep([CountGlobalStep, IsStep(gt(10))]),
> GroupCountStep([VertexStep(BOTH,vertex), CountGlobalStep])]
> RangeByIsCountStrategy [O] [GraphStep(vertex,[]),
> HasStep([name.eq(marko)]), VertexStep(OUT,[knows],edge), IdentityStep,
> EdgeVertexStep(IN), TraversalFilterStep([RangeGlobalStep(0,11),
> CountGlobalStep, IsStep(gt(10))]), GroupCountStep([VertexStep(BOTH,vertex),
> CountGlobalStep])]
> IdentityRemovalStrategy [O] [GraphStep(vertex,[]),
> HasStep([name.eq(marko)]), VertexStep(OUT,[knows],edge),
> EdgeVertexStep(IN), TraversalFilterStep([RangeGlobalStep(0,11),
> CountGlobalStep, IsStep(gt(10))]), GroupCountStep([VertexStep(BOTH,vertex),
> CountGlobalStep])]
> IncidentToAdjacentStrategy [O] [GraphStep(vertex,[]),
> HasStep([name.eq(marko)]), VertexStep(OUT,[knows],vertex),
> TraversalFilterStep([RangeGlobalStep(0,11), CountGlobalStep,
> IsStep(gt(10))]), GroupCountStep([VertexStep(BOTH,vertex),
> CountGlobalStep])]
> AdjacentToIncidentStrategy [O] [GraphStep(vertex,[]),
> HasStep([name.eq(marko)]), VertexStep(OUT,[knows],vertex),
> TraversalFilterStep([RangeGlobalStep(0,11), CountGlobalStep,
> IsStep(gt(10))]), GroupCountStep([VertexStep(BOTH,edge), CountGlobalStep])]
> FilterRankingStrategy [O] [GraphStep(vertex,[]),
> HasStep([name.eq(marko)]), VertexStep(OUT,[knows],vertex),
> TraversalFilterStep([RangeGlobalStep(0,11), CountGlobalStep,
> IsStep(gt(10))]), GroupCountStep([VertexStep(BOTH,edge), CountGlobalStep])]
> MatchPredicateStrategy [O] [GraphStep(vertex,[]),
> HasStep([name.eq(marko)]), VertexStep(OUT,[knows],vertex),
> TraversalFilterStep([RangeGlobalStep(0,11), CountGlobalStep,
> IsStep(gt(10))]), GroupCountStep([VertexStep(BOTH,edge), CountGlobalStep])]
> TinkerGraphStepStrategy [P]
> [TinkerGraphStep(vertex,[name.eq(marko)]), VertexStep(OUT,[knows],vertex),
> TraversalFilterStep([RangeGlobalStep(0,11), CountGlobalStep,
> IsStep(gt(10))]), GroupCountStep([VertexStep(BOTH,edge), CountGlobalStep])]
> ProfileStrategy [F]
> [TinkerGraphStep(vertex,[name.eq(marko)]), VertexStep(OUT,[knows],vertex),
> TraversalFilterStep([RangeGlobalStep(0,11), CountGlobalStep,
> IsStep(gt(10))]), GroupCountStep([VertexStep(BOTH,edge), CountGlobalStep])]
> StandardVerificationStrategy [V]
> [TinkerGraphStep(vertex,[name.eq(marko)]), VertexStep(OUT,[knows],vertex),
> TraversalFilterStep([RangeGlobalStep(0,11), CountGlobalStep,
> IsStep(gt(10))]), GroupCountStep([VertexStep(BOTH,edge), CountGlobalStep])]
>
> Final Traversal
> [TinkerGraphStep(vertex,[name.eq(marko)]), VertexStep(OUT,[knows],vertex),
> TraversalFilterStep([RangeGlobalStep(0,11), CountGlobalStep,
> IsStep(gt(10))]), GroupCountStep([VertexStep(BOTH,edge), CountGlobalStep])]
> gremlin>
>
> Marko.