Re: Implementing TinkerPop on top of GraphX

York, Brennon Thu, 06 Nov 2014 16:40:16 -0800

My personal 2c is that, since GraphX is just beginning to provide a full 
featured graph API, I think it would be better to align with the TinkerPop 
group rather than roll our own. In my mind the benefits out way the detriments 
as follows:

Benefits:
* GraphX gains the ability to become another core tenant within the TinkerPop 
community allowing a more diverse group of users into the Spark ecosystem.
* TinkerPop can continue to maintain and own a solid / feature-rich graph API 
that has already been accepted by a wide audience, relieving the pressure of 
“one off” API additions from the GraphX team.
* GraphX can demonstrate its ability to be a key player in the GraphDB space 
sitting inline with other major distributions (Neo4j, Titan, etc.).
* Allows for the abstract graph traversal logic (query API) to be owned and 
maintained by a group already proven on the topic.

Drawbacks:
* GraphX doesn’t own the API for its graph query capability. This could be seen 
as good or bad, but it might make GraphX-specific implementation additions more 
tricky (possibly). Also, GraphX will need to maintain the features described 
within the TinkerPop API as that might change in the future.

From: Kushal Datta <[email protected]<mailto:[email protected]>>
Date: Thursday, November 6, 2014 at 4:00 PM
To: "York, Brennon" 
<[email protected]<mailto:[email protected]>>
Cc: Kyle Ellrott <[email protected]<mailto:[email protected]>>, Reynold 
Xin <[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>, Matthias Broecheler 
<[email protected]<mailto:[email protected]>>
Subject: Re: Implementing TinkerPop on top of GraphX

Before we dive into the implementation details, what are the high level 
thoughts on Gremlin/GraphX? Scala already provides the procedural way to query 
graphs in GraphX today. So, today I can run g.vertices().filter().join() 
queries as OLAP in GraphX just like Tinkerpop3 Gremlin, of course sans the 
useful operators that Gremlin offers such as outE, inE, loop, as, dedup, etc. 
In that case is mapping Gremlin operators to GraphX api's a better approach or 
should we extend the existing set of transformations/actions that GraphX 
already offers with the useful operators from Gremlin? For example, we add 
as(), loop() and dedup() methods in VertexRDD and EdgeRDD.

Either way we get a desperately needed graph query interface in GraphX.

On Thu, Nov 6, 2014 at 3:25 PM, York, Brennon 
<[email protected]<mailto:[email protected]>> wrote:
This was my thought exactly with the TinkerPop3 release. Looks like, to move 
this forward, we’d need to implement gremlin-core per 
<http://www.tinkerpop.com/docs/3.0.0.M1/#_implementing_gremlin_core>. The real 
question lies in whether GraphX can only support the OLTP functionality, or if 
we can bake into it the OLAP requirements as well. At a first glance I believe 
we could create an entire OLAP system. If so, I believe we could do this in a 
set of parallel subtasks, those being the implementation of each of the 
individual API’s (Structure, Process, and, if OLAP, GraphComputer) necessary 
for gremlin-core. Thoughts?

From: Kyle Ellrott <[email protected]<mailto:[email protected]>>
Date: Thursday, November 6, 2014 at 12:10 PM
To: Kushal Datta <[email protected]<mailto:[email protected]>>
Cc: Reynold Xin <[email protected]<mailto:[email protected]>>, "York, 
Brennon" <[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>, Matthias Broecheler 
<[email protected]<mailto:[email protected]>>
Subject: Re: Implementing TinkerPop on top of GraphX

I still have to dig into the Tinkerpop3 internals (I started my work long 
before it had been released), but I can say that to get the Tinerpop2 Gremlin 
pipeline to work in the GraphX was a bit of a hack. The whole Tinkerpop2 
Gremlin design was based around streaming pipes of data, rather then large 
distributed map-reduce operations. I had to hack the pipes to aggregate all of 
the data and pass a single object wrapping the GraphX RDDs down the pipes in a 
single go, rather then streaming it element by element.
Just based on their description, Tinkerpop3 may be more amenable to the Spark 
platform.

Kyle

On Thu, Nov 6, 2014 at 11:55 AM, Kushal Datta 
<[email protected]<mailto:[email protected]>> wrote:
What do you guys think about the Tinkerpop3 Gremlin interface?
It has MapReduce to run Gremlin operators in a distributed manner and Giraph to 
execute vertex programs.

The Tinkpop3 is better suited for GraphX.

On Thu, Nov 6, 2014 at 11:48 AM, Kyle Ellrott 
<[email protected]<mailto:[email protected]>> wrote:
I've taken a crack at implementing the TinkerPop Blueprints API in GraphX (
https://github.com/kellrott/sparkgraph ). I've also implemented portions of
the Gremlin Search Language and a Parquet based graph store.
I've been working out finalize some code details and putting together
better code examples and documentation before I started telling people
about it.
But if you want to start looking at the code, I can answer any questions
you have. And if you would like to contribute, I would really appreciate
the help.

Kyle

On Thu, Nov 6, 2014 at 11:42 AM, Reynold Xin 
<[email protected]<mailto:[email protected]>> wrote:

> cc Matthias
>
> In the past we talked with Matthias and there were some discussions about
> this.
>
> On Thu, Nov 6, 2014 at 11:34 AM, York, Brennon <
> [email protected]<mailto:[email protected]>>
> wrote:
>
> > All, was wondering if there had been any discussion around this topic
> yet?
> > TinkerPop <https://github.com/tinkerpop> is a great abstraction for
> graph
> > databases and has been implemented across various graph database backends
> > / gaining traction. Has anyone thought about integrating the TinkerPop
> > framework with GraphX to enable GraphX as another backend? Not sure if
> > this has been brought up or not, but would certainly volunteer to
> > spearhead this effort if the community thinks it to be a good idea!
> >
> > As an aside, wasn¹t sure if this discussion should happen on the board
> > here or on JIRA, but a made a ticket as well for reference:
> > https://issues.apache.org/jira/browse/SPARK-4279
> >
> > ________________________________________________________
> >
> > The information contained in this e-mail is confidential and/or
> > proprietary to Capital One and/or its affiliates. The information
> > transmitted herewith is intended only for use by the individual or entity
> > to which it is addressed.  If the reader of this message is not the
> > intended recipient, you are hereby notified that any review,
> > retransmission, dissemination, distribution, copying or other use of, or
> > taking of any action in reliance upon this information is strictly
> > prohibited. If you have received this communication in error, please
> > contact the sender and delete the material from your computer.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: 
> > [email protected]<mailto:[email protected]>
> > For additional commands, e-mail: 
> > [email protected]<mailto:[email protected]>
> >
> >
>

________________________________

The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed.  If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed.  If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.

Re: Implementing TinkerPop on top of GraphX

Reply via email to