Re: [Discuss graph source/sink design proposal]

Saikat Kanjilal Mon, 13 Jun 2016 12:06:03 -0700

That's a responsibility of the graph db not flume, flume is responsible for 
delivering the events and has no understanding of connectivity of the data.  
The goal in using flume is to connect incoming data that is heterogeneous and 
transform that data before dumping it into the graph db.


Sent from my iPhone

> On Jun 13, 2016, at 11:09 AM, Lior Zeno <[email protected]> wrote:
> 
> I got this part. How events are linked together? Do you expect an adjacency
> list incorporated in the header?
> 
> On Mon, Jun 13, 2016 at 8:59 PM, Saikat Kanjilal <[email protected]>
> wrote:
> 
>> The use case is a flume developer wanting to connect data coming into and
>> out of flume sinks/sources to a graph database
>> 
>> Sent from my iPhone
>> 
>>> On Jun 13, 2016, at 10:55 AM, Lior Zeno <[email protected]> wrote:
>>> 
>>> I'm not sure that I follow here. Can you please give a detailed use-case?
>>> 
>>>> On Mon, Jun 13, 2016 at 7:20 AM, Lior Zeno <[email protected]> wrote:
>>>> 
>>>> Thanks. I'll review this and share my comments later on today.
>>>>> On Jun 13, 2016 2:30 AM, "Saikat Kanjilal" <[email protected]>
>> wrote:
>>>>> 
>>>>> Motivation/Design: The graph/sink source plugin will be used to
>>>>> custom transformations to connected data and dynamically apply these
>>>>> transformations to send data to any sync, an example of a set of
>>>>> destination sinks include elasticsearch/relational databases/spark rdd
>>>>> etc.   Note that this plugin will serve as a source and a sink
>> depending
>>>>> on the configurations.  For v1 I am targeting that we plug into neo4j
>>>>> database using the neo4j-jdbc interface (
>>>>> https://github.com/larusba/neo4j-jdbc)
>>>>> to build http payloads to talk to neo4j.  Once our neo4j interface will
>>>>> allow us to build generic interfaces and plug in any graph store in the
>>>>> future.
>>>>> The
>>>>> design will consist of a hybrid piece of infrastructure serving both as
>>>>> a source and a sink connected to the current flume infrastructure
>>>>> (since all the current sinks and sources are living in their own
>>>>> directories I would suggest this live somewhere else in the flume
>>>>> directory structure.  Listed below is some classes I have partially
>>>>> configured to kick off this
>>>>> discussion
>>>>> NeoRestClient
>>>>> Roles and Responsibilities: Interface to neo4j, unpack and pack data
>>>>> structures to perform CRUD operation on a local or remote noe4j
>> instance
>>>>> APIS:
>>>>> //inputs flume event
>>>>> //outputs flume data structure identifying success metrics around the
>>>>> operation
>>>>> //description: transform the flume event into a graph node
>>>>> insertNode(NeoNode nodeToInsert)
>>>>> searchNode(NeoNode nodeToSearch,Algorithm useAStarOrDijkstra)
>>>>> deleteNode(NeoNode nodeToDelete)
>>>>> 
>>>>> 
>>>>> Note that I would also like to offer up the chance to present cipher
>>>>> queries (http://neo4j.com/developer/cypher-query-language/) to the
>>>>> source/sink infrastructure
>>>>> 
>>>>> Neo4jDynamicSerializer
>>>>> Roles and responsibilities: serialize flume headers and body and use
>> the
>>>>> Neo4jRestClient to perform crud on neo4j
>>>>> 
>>>>> 
>>>>> Both the source and the sink infrastructure will use the same
>>>>> infrastructure above.
>>>>> 
>>>>> 
>>>>> That should be enough of a first cut for design/motivation and JIRA
>>>>> details, would love to kick off the discussion at this point.
>>>>> Thanks in advance
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> From: [email protected]
>>>>>> To: [email protected]
>>>>>> Subject: [Discuss graph source/sink design proposal]
>>>>>> Date: Sun, 12 Jun 2016 15:01:14 -0700
>>>>>> 
>>>>>> Jira with details here:
>>>>> https://issues.apache.org/jira/browse/FLUME-2035
>>>>>> 
>>>>>> Please respond with your questions.
>>

Re: [Discuss graph source/sink design proposal]

Reply via email to