Hi Vasia,
> Flink doesn't have a graph query language yet, so Gremlin support would be
> a really nice contribution.
> I have read the blog post and also the Gremlin paper. There are some really
> great ideas in there!
Great. Glad you are excited about Gremlin.
> I'm currently quite busy with several projects, so I don't see myself
> working on a FlinkGraphComputer soon. If someone from the TinkerPop
> community would like to take this on, I (and the rest of the Flink
> community) would of course be more than happy to provide feedback and help
> with Flink-related issues. Otherwise, I'll get back to you once my load
> levels decrease a bit :)
In the past, TinkerPop use to be a "dumping ground" for all implementations,
but we decided for TinkerPop3 that we would only have "reference
implementations" so users can play, system providers can learn, and ultimately,
system providers would provide TinkerPop support in their distribution. As
such, we would like to have FlinkGraphComputer distributed with Flink. If that
sounds like something your project would be comfortable with, I think we can
provide a JIRA/PR for FlinkGraphComputer (as well as any necessary
documentation). We can start with a JIRA ticket to get things going. Thoughts?
Besides some I/O stuff (InputFormats, RDDs, etc.), this is the beef of the
SparkGraphComputer implementation:
https://github.com/apache/incubator-tinkerpop/tree/master/spark-gremlin/src/main/java/org/apache/tinkerpop/gremlin/spark/process/computer
> Keep up the great work!
Thanks, you too.
Marko.
http://markorodriguez.com
>
> On 4 December 2015 at 11:28, James Thornton <[email protected]> wrote:
>
>> *Vasia* *Kalavri**: **Gelly**: Large-scale graph analysis with Apache *
>> *Flink*
>>
>> https <https://youtu.be/-tFzG2dzJXw>:// <https://youtu.be/-tFzG2dzJXw>
>> youtu.be <https://youtu.be/-tFzG2dzJXw>/-tFzG2dzJXw
>> <https://youtu.be/-tFzG2dzJXw>
>> On Nov 30, 2015 12:49 PM, "Marko Rodriguez" <[email protected]> wrote:
>>
>>> Hi Vasia (everyone),
>>>
>>> Does Flink have a graph query language? If not, then with a
>>> FlinkGraphComputer implementation, Flink could ship with Gremlin support.
>>>
>>> If you have the time, please read the following blog post as it will help
>>> explain our approach and how Flink could benefit from it:
>>>
>>>
>> http://www.datastax.com/dev/blog/the-benefits-of-the-gremlin-graph-traversal-machine
>>>
>>> In short, if Flink provides a FlinkGraphComputer implementation, then the
>>> Gremlin virtual machine will work over Flink and any language that
>> compiles
>>> to the Gremlin virtual machine will thus work over Flink.
>>>
>>> If you would like to see a demo of TinkerPop with, for example Spark or
>>> Giraph, I'd be more than happy to do a Google Hangout session with you
>> (< 1
>>> hour) so you can better understand the breadth of the work we are doing
>> and
>>> how it can benefit your efforts.
>>>
>>> Thanks Vasia,
>>> Marko.
>>>
>>> http://markorodriguez.com
>>>
>>> On Nov 27, 2015, at 5:27 AM, Stephen Mallette <[email protected]>
>>> wrote:
>>>
>>>> Hi Vasia, I had started tinkering on it in my spare time in a separate
>>>> repo. There really isn't much to collaborate on at this point. I was
>>>> mostly trying to understand the parallels between Flink and Spark so
>>> that I
>>>> could understand how a FlinkGraphComputer could be implemented given
>> what
>>>> I'd seen of the Spark implementation Marko did. I had expected to
>>>> contribute the work to Flink (rather than keep it here on the TinkerPop
>>>> side). Anyway, not much else to offer - Marko can probably get you
>>> running
>>>> much faster than I can, as that area is where he holds the most
>>> expertise.
>>>> You should probably keep an eye out for his comments.
>>>>
>>>>
>>>>
>>>> On Wed, Nov 25, 2015 at 11:38 AM, Vasiliki Kalavri <[email protected]>
>>> wrote:
>>>>
>>>>> Hi James and TinkerPop community,
>>>>>
>>>>> thanks a lot for starting this discussion!
>>>>> I am Vasia, Apache Flink PMC and core Gelly developer. Nice to meet
>> you
>>> ;)
>>>>>
>>>>> I'm only starting to get familiar with the TinkerPop project, but it
>>> seems
>>>>> that it can play well with Flink.
>>>>> As you already noticed, a FlinkGraphComputer should be
>> straight-forward
>>> to
>>>>> implement. Gelly has a vertex-centric API that is similar to the
>>>>> scatter-gather model [1] and a gather-sum-apply API [2] that is closer
>>> to
>>>>> the Powergraph model. These are built on top of Flink's delta
>> iteration
>>>>> operators, which are more generic and could also be used directly for
>>> the
>>>>> FlinkGraphComputer, if the existing Gelly abstractions won't work.
>>>>>
>>>>> Regarding the difference between stream and batch in Flink. Flink is a
>>>>> streaming dataflow engine, on top of which you can run both streaming
>>> and
>>>>> batch jobs. A batch job is simply seen by Flink as a job operating on
>> a
>>>>> finite stream. Respectively, Flink has a stream and a batch API. Gelly
>>> is
>>>>> currently built on top of the batch API, i.e. the DataSet API.
>>>>>
>>>>> James mentioned in the Flink mailing list that someone has already
>>> started
>>>>> working on a FlinkGraphComputer. Is there a JIRA for this? Let me know
>>> if
>>>>> you have questions or you think I can help in some way!
>>>>>
>>>>> Cheers,
>>>>> -Vasia.
>>>>>
>>>>> [1]:
>>>>>
>>>>>
>>>
>> https://ci.apache.org/projects/flink/flink-docs-master/libs/gelly_guide.html#vertex-centric-iterations
>>>>> [2]:
>>>>>
>>>>>
>>>
>> https://ci.apache.org/projects/flink/flink-docs-master/libs/gelly_guide.html#gather-sum-apply-iterations
>>>>> [3]:
>>>>>
>>>>>
>>>
>> https://ci.apache.org/projects/flink/flink-docs-master/apis/iterations.html#delta-iterate-operator
>>>>>
>>>>> On 25 November 2015 at 17:05, James Thornton <[email protected]
>>>
>>>>> wrote:
>>>>>
>>>>>> Hi Vasia -
>>>>>>
>>>>>> Welcome to TinkerPop (linking you into the Flink thread as
>>> requested)...
>>>>>>
>>>>>> - James
>>>>>>
>>>>>> On Mon, Nov 23, 2015 at 10:01 AM, Marko Rodriguez <
>>> [email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi James,
>>>>>>>
>>>>>>> Thank you for always having a ear to the tech pulse. If it wasn't
>> for
>>>>>> you,
>>>>>>> I would still be excited about XMPP and would be programming in
>>> Tcl/Tk.
>>>>>>>
>>>>>>> Given my 20 minute review of their docs …… It would be cool if like
>>> the
>>>>>>> "Table API," they also had a "Graph API" that was just TinkerPop
>>>>>>> Graph/Vertex/Edge. That could be super intrusive, so as a simple
>> step
>>>>> --
>>>>>>> they already have a "vertex-centric" API and thus, having a
>>>>>>> FlinkGraphComputer implementation seems "easy." Then from there,
>>>>> Gremlin
>>>>>>> should just work. I don't really understand the difference between
>>>>> steam
>>>>>>> and batch unless they are talking the difference between "Storm" and
>>>>>>> "MapReduce." ? Would be cool to see how TinkerPop fits into the
>>>>>>> stream-scene.
>>>>>>>
>>>>>>> Next, their fluent API is similar to Spark's and I would argue that
>>>>>>> Gremlin's API is much nicer than just low-level primitives like
>> map(),
>>>>>>> flatMap(), etc. Thus, they could really benefit from having a full
>>>>> graph
>>>>>>> query language already available for their users. (As a side note,
>> its
>>>>>>> really nice to see more and more systems use functional/fluent APIs
>> as
>>>>>> this
>>>>>>> really trains the next generation to think like this which is
>>> important
>>>>>> as
>>>>>>> Gremlin is purely this! Hopefully the SQL model of querying starts
>> to
>>>>>> look
>>>>>>> odd to people in comparison.)
>>>>>>>
>>>>>>> I just sent out this tweet:
>>>>>>>
>> https://twitter.com/apachetinkerpop/status/668820458599530497
>>>>>>>
>>>>>>> If they seem positive, I can detail in JIRA what would be required
>> for
>>>>>>> them to have TinkerPop-support.
>>>>>>>
>>>>>>> Thanks again James,
>>>>>>> Marko.
>>>>>>>
>>>>>>> http://markorodriguez.com
>>>>>>>
>>>>>>> On Nov 19, 2015, at 12:19 PM, James Thornton <
>> [email protected]
>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi -
>>>>>>>>
>>>>>>>> Apache Flink has a graph API named Gelly...
>>>>>>>>
>>>>>>>>
>>>>> https://flink.apache.org/news/2015/08/24/introducing-flink-gelly.html
>>>>>>>>
>>>>>>>> ...and Flink's "dedicated support for iterative operations" should
>>>>> pair
>>>>>>>> well with Gremlin:
>>>>>>>>
>>>>>>>> https://flink.apache.org/features.html
>>>>>>>>
>>>>>>>> Has anyone dug into this yet?
>>>>>>>>
>>>>>>>> - James
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> James Thornton, *http://electricspeed.com <
>> http://electricspeed.com
>>>>>> *
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> James Thornton, *http://electricspeed.com <http://electricspeed.com
>>> *
>>>>>>
>>>>>
>>>
>>>
>>