Got it. On Mon, Jun 13, 2016 at 10:05 PM, Saikat Kanjilal <[email protected]> wrote:
> That's a responsibility of the graph db not flume, flume is responsible > for delivering the events and has no understanding of connectivity of the > data. The goal in using flume is to connect incoming data that is > heterogeneous and transform that data before dumping it into the graph db. > > Sent from my iPhone > > > On Jun 13, 2016, at 11:09 AM, Lior Zeno <[email protected]> wrote: > > > > I got this part. How events are linked together? Do you expect an > adjacency > > list incorporated in the header? > > > > On Mon, Jun 13, 2016 at 8:59 PM, Saikat Kanjilal <[email protected]> > > wrote: > > > >> The use case is a flume developer wanting to connect data coming into > and > >> out of flume sinks/sources to a graph database > >> > >> Sent from my iPhone > >> > >>> On Jun 13, 2016, at 10:55 AM, Lior Zeno <[email protected]> wrote: > >>> > >>> I'm not sure that I follow here. Can you please give a detailed > use-case? > >>> > >>>> On Mon, Jun 13, 2016 at 7:20 AM, Lior Zeno <[email protected]> > wrote: > >>>> > >>>> Thanks. I'll review this and share my comments later on today. > >>>>> On Jun 13, 2016 2:30 AM, "Saikat Kanjilal" <[email protected]> > >> wrote: > >>>>> > >>>>> Motivation/Design: The graph/sink source plugin will be used to > >>>>> custom transformations to connected data and dynamically apply these > >>>>> transformations to send data to any sync, an example of a set of > >>>>> destination sinks include elasticsearch/relational databases/spark > rdd > >>>>> etc. Note that this plugin will serve as a source and a sink > >> depending > >>>>> on the configurations. For v1 I am targeting that we plug into neo4j > >>>>> database using the neo4j-jdbc interface ( > >>>>> https://github.com/larusba/neo4j-jdbc) > >>>>> to build http payloads to talk to neo4j. Once our neo4j interface > will > >>>>> allow us to build generic interfaces and plug in any graph store in > the > >>>>> future. > >>>>> The > >>>>> design will consist of a hybrid piece of infrastructure serving both > as > >>>>> a source and a sink connected to the current flume infrastructure > >>>>> (since all the current sinks and sources are living in their own > >>>>> directories I would suggest this live somewhere else in the flume > >>>>> directory structure. Listed below is some classes I have partially > >>>>> configured to kick off this > >>>>> discussion > >>>>> NeoRestClient > >>>>> Roles and Responsibilities: Interface to neo4j, unpack and pack data > >>>>> structures to perform CRUD operation on a local or remote noe4j > >> instance > >>>>> APIS: > >>>>> //inputs flume event > >>>>> //outputs flume data structure identifying success metrics around the > >>>>> operation > >>>>> //description: transform the flume event into a graph node > >>>>> insertNode(NeoNode nodeToInsert) > >>>>> searchNode(NeoNode nodeToSearch,Algorithm useAStarOrDijkstra) > >>>>> deleteNode(NeoNode nodeToDelete) > >>>>> > >>>>> > >>>>> Note that I would also like to offer up the chance to present cipher > >>>>> queries (http://neo4j.com/developer/cypher-query-language/) to the > >>>>> source/sink infrastructure > >>>>> > >>>>> Neo4jDynamicSerializer > >>>>> Roles and responsibilities: serialize flume headers and body and use > >> the > >>>>> Neo4jRestClient to perform crud on neo4j > >>>>> > >>>>> > >>>>> Both the source and the sink infrastructure will use the same > >>>>> infrastructure above. > >>>>> > >>>>> > >>>>> That should be enough of a first cut for design/motivation and JIRA > >>>>> details, would love to kick off the discussion at this point. > >>>>> Thanks in advance > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>> From: [email protected] > >>>>>> To: [email protected] > >>>>>> Subject: [Discuss graph source/sink design proposal] > >>>>>> Date: Sun, 12 Jun 2016 15:01:14 -0700 > >>>>>> > >>>>>> Jira with details here: > >>>>> https://issues.apache.org/jira/browse/FLUME-2035 > >>>>>> > >>>>>> Please respond with your questions. > >> >
