RE: [Discuss graph source/sink design proposal]

Saikat Kanjilal Thu, 30 Jun 2016 22:17:22 -0700

So I've started the coding efforts on this, here's some details:
1) I've cloned the hbase sink for now and am refactoring all of that code to 
work with neo4j as a start2) I'm only focusing on creating a sink that will 
perform basic CRUD streaming operations into neo4j3) I've sent an email to the 
neo4j guys to figure out details around building a streaming architecture with 
the neo4j kernel4) In the meantime how would you guys like to review the code, 
I've cloned the flume repo and have created a branch called flume-2035 where I 
will work, should I put all the code in bitbucket and send out periodic 
reviews, this is going to be a sizeable effort5) How should we think about 
cipher related workflows as it relates to the streaming data coming in , to see 
a ful flavor for cipher go here 
https://neo4j.com/developer/cypher-query-language/


Would love to get some discussion going on 2-5.
Thanks

> From: [email protected]
> Date: Wed, 29 Jun 2016 17:24:16 -0700
> Subject: Re: [Discuss graph source/sink design proposal]
> To: [email protected]
> 
> Hmm, maybe a different Kudu project? Not sure.
> 
> Anyway, this type of "changelog" thing would require support in the DB for
> streaming its write-ahead log or something. For example, we don't support
> that in Apache Kudu (incubating) -- maybe someday.
> 
> Regarding Flume, I usually think it's useful to distinguish between a
> source and a sink. They are typically written as separate classes and they
> represent different interfaces at the Flume Java API level.
> 
> So, how would one write a streaming database source? That really depends on
> the database and the APIs it provides for that.
> 
> Mike
> 
> On Tue, Jun 28, 2016 at 8:30 AM, Saikat Kanjilal <[email protected]>
> wrote:
> 
> > :) I'm using Kudu at work at the moment to troubleshoot some Tomcat
> > issues,  regarding the where to keep the source code I would say for now
> > lets go with the plugin approach and revisit the "where does the code live"
> > conversation later.  One thing I do want to discuss is that the plugin will
> > act as a source or a sink depending on configuration, so if the plugin acts
> > as a source we need a mechanism (like a daemon in syslog) to stream changes
> > real time from a graphdb into flume, I was wondering if there are any past
> > approaches around this that I can follow, I may need to dig into the neo4j
> > kernel to see where we can inject something like this.
> > Thoughts on that?
> >
> > > From: [email protected]
> > > Date: Tue, 28 Jun 2016 00:27:45 -0700
> > > Subject: Re: [Discuss graph source/sink design proposal]
> > > To: [email protected]
> > >
> > > Hi Saikat,
> > > Please see my thoughts inline. This is how I think about this stuff;
> > others
> > > may think about it differently.
> > >
> > > On Mon, Jun 27, 2016 at 8:45 PM, Saikat Kanjilal <[email protected]>
> > > wrote:
> > >
> > > > Exactly right, I'm proposing we create a graph sink for flume while
> > > > keeping the flume core intact.
> > >
> > >
> > > As you are probably aware, sources and sinks don't have to be part of the
> > > main Apache Flume source tree to be used with Flume. The plugins.d
> > > mechanism described in [1] makes building and integrating separate
> > plugins
> > > into Flume an easy thing to do at deployment time.
> > >
> > > In another project I work on, Apache Kudu (incubating), we have a Flume
> > > Kudu sink committed in the main source tree [2]. We may at some point
> > > propose to move it into the Flume source tree, but for now (for testing
> > and
> > > API stability reasons) it's easier to keep it in the Kudu source tree.
> > >
> > > Likewise, you could implement a Flume Neo4J sink and post it up on GitHub
> > > (or maybe in the Neo4J tree?). Donating it to the Apache Flume project
> > once
> > > it's in decent shape may make sense at some point, especially if the
> > > dependencies are easy to share and integrate into the Flume project.
> > > However, I wouldn't say that it's a foregone conclusion that it really
> > > needs to be part of the Flume source tree. Assuming you need the sink,
> > and
> > > are going to implement it anyway, then maybe we can defer the discussion
> > of
> > > whether to include it in the Flume source tree until later. One of the
> > > things I try to keep in mind when integrating new plugin code is whether
> > > the project will be able to support the maintenance burden of the new
> > code.
> > >
> > > In reading from a graph db we need a mechanism to stream data from the
> > > > graph store into flume.
> > > >
> > >
> > > Yes, I'd say it could potentially make sense to create a Flume Neo4J
> > source
> > > as well. I think the same logic as above would still apply.
> > >
> > > Regards,
> > > Mike
> > >
> > > [1]
> > >
> > https://flume.apache.org/FlumeUserGuide.html#installing-third-party-plugins
> > > [2]
> > >
> > https://github.com/apache/incubator-kudu/tree/master/java/kudu-flume-sink
> >
> >

RE: [Discuss graph source/sink design proposal]

Reply via email to