I suppose I'll need to do the same with neo4j. Thanks Sent from my iPhone
> On Jul 13, 2016, at 6:21 PM, Mike Percy <[email protected]> wrote: > > For the Flume-Kafka integration we start up Kafka mini clusters in the unit > tests. It depends on the server. The project doesn't have any permanent > infrastructure in place with long running servers. > > Mike > > On Wed, Jul 13, 2016 at 5:37 PM, Saikat Kanjilal <[email protected]> > wrote: > >> Mike et al, >> >> Out of curiosity how do committers usually run integration tests when >> doing flume sink development, at some point I will have the graph sink >> talking to neo4j and would really rather not have to test everything >> locally as the performance of testing locally would make the whole >> operation not really reflect the actual sink performance. Any ideas on how >> to get past this. I'm not there yet but will be there in a few weeks where >> I'll need to start perf/integration testing. >> >> >> Thanks in advance. >> >> >> ________________________________ >> From: Saikat Kanjilal <[email protected]> >> Sent: Saturday, July 9, 2016 8:16 AM >> To: [email protected] >> Subject: Re: [Discuss graph source/sink design proposal] >> >> Mike et al, >> >> To clarify again I'm starting with the hbase sink and modifying it to >> match the graph use case. This si probably why you saw the hbase stuff >> still left over. In a nutshell the design will look like the following: >> >> >> flume->neo4j (sink workflow) >> >> We batch events up from flume, we use the neo4j bolt driver to convert the >> batch of events into cipher statements and then we send the data in bulk >> into neo4j, one open question here might be how many go in a batch and >> should this be dynamically configurable >> >> >> neo4j->flume (source workflow) >> >> We add event listeners inside neo4j and then send data back into flume >> through these listeners, although here we'd need to really be careful about >> sending every single event, a batching strategy here might also make sense >> but takes out the concept of real time updates >> >> >> More later as I make more progress, also your criteria for acceptance of >> this sink is no different than accepting contributions to any other open >> source project , I guess I'd like to also know if there's interest from the >> community in connecting flume with neo4j as that would generate more >> feedback on the design. >> >> Here's a blurb on the new neo4j java and other languages interface: >> >> https://neo4j.com/blog/neo4j-3-0-language-drivers/ >> A Deeper Dive into Neo4j 3.0 Language Drivers< >> https://neo4j.com/blog/neo4j-3-0-language-drivers/> >> neo4j.com >> Discover the four new language drivers for Neo4j 3.0 that provide easy >> access to Neo4j through a uniform API, regardless of programming language. >> >> >> >> >> >> Thanks >> A Deeper Dive into Neo4j 3.0 Language Drivers< >> https://neo4j.com/blog/neo4j-3-0-language-drivers/> >> neo4j.com >> Discover the four new language drivers for Neo4j 3.0 that provide easy >> access to Neo4j through a uniform API, regardless of programming language. >> >> >> >> >> ________________________________ >> From: Mike Percy <[email protected]> >> Sent: Friday, July 8, 2016 6:22 PM >> To: [email protected] >> Subject: Re: [Discuss graph source/sink design proposal] >> >> Hi Saikat, please see my responses inline. >> >> On Thu, Jul 7, 2016 at 8:50 PM, Saikat Kanjilal <[email protected]> >> wrote: >> >>> Ok moved the code to here: >>> https://bitbucket.org/skanjila/flume-ng-graph-sink >> [ >> https://d301sr5gafysq2.cloudfront.net/e5b75889441d/img/repo-avatars/default.svg >> ]<https://bitbucket.org/skanjila/flume-ng-graph-sink> >> >> skanjila / flume-ng-graph-sink< >> https://bitbucket.org/skanjila/flume-ng-graph-sink> >> bitbucket.org >> Git repository hosted by Bitbucket. >> >> >> >> >> >> It looks like mostly still HBaseSink code right now, just with a different >> package name. I only looked at the Async one and that's what I found. >> >> Also I am exploring using the https://github.com/neo4j/neo4j-java-driver >> using >>> the bolt protocol to connect to neo4j to stream events >> >> I don't know anything about Neo4J personally. Unfortunately I don't have >> time to really participate in development of this new sink using technology >> I have no use for, myself. Maybe there are others on this list that have >> the time and interest to help. >> >> Looking forward to getting feedback on this effort as y'all have time. >> >> I apologize for not having the time to provide much guidance beyond the >> capabilities of Flume itself. >> >> In the future, as a committer on Flume, I would personally consider merging >> Neo4J support into the Flume source tree if the following conditions were >> met: >> >> 1. Strong feedback from others that this connector is desired by multiple >> members of the community >> 2. An implementation that is well designed, tested, and production-grade >> 3. A likely long-term maintainer (maybe that is you?) >> >> The reason I hesitate to add more integrations into the core is that if >> this breaks, and someone is using it, we will have to fix it. If someone >> asks a question on the mailing lists, we will have to attempt to answer it. >> >> Regards, >> Mike >> >> >> From: Saikat Kanjilal <[email protected]> >>> Sent: Thursday, July 7, 2016 9:31 AM >>> To: [email protected] >>> Subject: Re: [Discuss graph source/sink design proposal] >>> >>> Would it be ok to use bitbucket instead? I have indeed extended >>> AbstractSink to build the graph sink, I will depend on flume-ng-core on >> my >>> pom as well. >>> >>> Thanks and feel free to respond on the cipher discussion as well as the >>> other items I mentioned earlier. >>> >>> >>> ________________________________ >>> From: Mike Percy <[email protected]> >>> Sent: Monday, July 4, 2016 12:03 PM >>> To: [email protected] >>> Subject: Re: [Discuss graph source/sink design proposal] >>> >>> Hi Saikat, >>> I recommend you use GitHub. Private branches in ASF repos are only >>> available to committers. >>> >>> Regarding forking Flume, you should not need to do that. Just depend on >>> flume-ng-core in your pom and extend AbstractSink. Maven will pull in >> your >>> deps. >>> >>> I'm out of town for the next few days but I'll try to respond in more >>> detail to your design notes when I'm back in town. >>> >>> Mike >>> >>> Sent from my iPhone >>> >>>> On Jul 4, 2016, at 6:59 AM, Saikat Kanjilal <[email protected]> >> wrote: >>>> >>>> Hari/Mike et al, >>>> >>>> I need a place to put interim checkins related to this work, is it >>> possible to get write privileges into a private branch so that I can >> commit >>> my code at intermediate junctures, I can also put it in bitbucket but >> would >>> rather not have to create yet another place for the code to live if it'll >>> eventually end up in the flume repo. >>>> >>>> >>>> Thanks in advance >>>> >>>> >>>> ________________________________ >>>> From: Saikat Kanjilal <[email protected]> >>>> Sent: Thursday, June 30, 2016 10:16 PM >>>> To: [email protected] >>>> Subject: RE: [Discuss graph source/sink design proposal] >>>> >>>> So I've started the coding efforts on this, here's some details: >>>> 1) I've cloned the hbase sink for now and am refactoring all of that >>> code to work with neo4j as a start2) I'm only focusing on creating a sink >>> that will perform basic CRUD streaming operations into neo4j3) I've sent >> an >>> email to the neo4j guys to figure out details around building a streaming >>> architecture with the neo4j kernel4) In the meantime how would you guys >>> like to review the code, I've cloned the flume repo and have created a >>> branch called flume-2035 where I will work, should I put all the code in >>> bitbucket and send out periodic reviews, this is going to be a sizeable >>> effort5) How should we think about cipher related workflows as it relates >>> to the streaming data coming in , to see a ful flavor for cipher go here >>> https://neo4j.com/developer/cypher-query-language/ >>> Neo4j's Graph Query Language: An Introduction to Cypher< >>> https://neo4j.com/developer/cypher-query-language/> >>> neo4j.com >>> Master the basics of Cypher – the graph query language for Neo4j – with >>> this introductory guide that teaches you how to read and write Cypher >>> queries. >>> >>> >>> >>> Neo4j's Graph Query Language: An Introduction to Cypher< >>> https://neo4j.com/developer/cypher-query-language/> >>> neo4j.com >>> Master the basics of Cypher – the graph query language for Neo4j – with >>> this introductory guide that teaches you how to read and write Cypher >>> queries. >>> >>> >>> >>>> Neo4j's Graph Query Language: An Introduction to Cypher< >>> https://neo4j.com/developer/cypher-query-language/> >>>> neo4j.com >>>> Master the basics of Cypher – the graph query language for Neo4j – with >>> this introductory guide that teaches you how to read and write Cypher >>> queries. >>>> >>>> >>>> >>>> >>>> Would love to get some discussion going on 2-5. >>>> Thanks >>>> >>>>> From: [email protected] >>>>> Date: Wed, 29 Jun 2016 17:24:16 -0700 >>>>> Subject: Re: [Discuss graph source/sink design proposal] >>>>> To: [email protected] >>>>> >>>>> Hmm, maybe a different Kudu project? Not sure. >>>>> >>>>> Anyway, this type of "changelog" thing would require support in the DB >>> for >>>>> streaming its write-ahead log or something. For example, we don't >>> support >>>>> that in Apache Kudu (incubating) -- maybe someday. >>>>> >>>>> Regarding Flume, I usually think it's useful to distinguish between a >>>>> source and a sink. They are typically written as separate classes and >>> they >>>>> represent different interfaces at the Flume Java API level. >>>>> >>>>> So, how would one write a streaming database source? That really >>> depends on >>>>> the database and the APIs it provides for that. >>>>> >>>>> Mike >>>>> >>>>> On Tue, Jun 28, 2016 at 8:30 AM, Saikat Kanjilal <[email protected] >>> >>>>> wrote: >>>>> >>>>>> :) I'm using Kudu at work at the moment to troubleshoot some Tomcat >>>>>> issues, regarding the where to keep the source code I would say for >>> now >>>>>> lets go with the plugin approach and revisit the "where does the code >>> live" >>>>>> conversation later. One thing I do want to discuss is that the >> plugin >>> will >>>>>> act as a source or a sink depending on configuration, so if the >> plugin >>> acts >>>>>> as a source we need a mechanism (like a daemon in syslog) to stream >>> changes >>>>>> real time from a graphdb into flume, I was wondering if there are any >>> past >>>>>> approaches around this that I can follow, I may need to dig into the >>> neo4j >>>>>> kernel to see where we can inject something like this. >>>>>> Thoughts on that? >>>>>> >>>>>>> From: [email protected] >>>>>>> Date: Tue, 28 Jun 2016 00:27:45 -0700 >>>>>>> Subject: Re: [Discuss graph source/sink design proposal] >>>>>>> To: [email protected] >>>>>>> >>>>>>> Hi Saikat, >>>>>>> Please see my thoughts inline. This is how I think about this stuff; >>>>>> others >>>>>>> may think about it differently. >>>>>>> >>>>>>> On Mon, Jun 27, 2016 at 8:45 PM, Saikat Kanjilal < >> [email protected] >>>> >>>>>>> wrote: >>>>>>> >>>>>>>> Exactly right, I'm proposing we create a graph sink for flume while >>>>>>>> keeping the flume core intact. >>>>>>> >>>>>>> >>>>>>> As you are probably aware, sources and sinks don't have to be part >> of >>> the >>>>>>> main Apache Flume source tree to be used with Flume. The plugins.d >>>>>>> mechanism described in [1] makes building and integrating separate >>>>>> plugins >>>>>>> into Flume an easy thing to do at deployment time. >>>>>>> >>>>>>> In another project I work on, Apache Kudu (incubating), we have a >>> Flume >>>>>>> Kudu sink committed in the main source tree [2]. We may at some >> point >>>>>>> propose to move it into the Flume source tree, but for now (for >>> testing >>>>>> and >>>>>>> API stability reasons) it's easier to keep it in the Kudu source >> tree. >>>>>>> >>>>>>> Likewise, you could implement a Flume Neo4J sink and post it up on >>> GitHub >>>>>>> (or maybe in the Neo4J tree?). Donating it to the Apache Flume >> project >>>>>> once >>>>>>> it's in decent shape may make sense at some point, especially if the >>>>>>> dependencies are easy to share and integrate into the Flume project. >>>>>>> However, I wouldn't say that it's a foregone conclusion that it >> really >>>>>>> needs to be part of the Flume source tree. Assuming you need the >> sink, >>>>>> and >>>>>>> are going to implement it anyway, then maybe we can defer the >>> discussion >>>>>> of >>>>>>> whether to include it in the Flume source tree until later. One of >> the >>>>>>> things I try to keep in mind when integrating new plugin code is >>> whether >>>>>>> the project will be able to support the maintenance burden of the >> new >>>>>> code. >>>>>>> >>>>>>> In reading from a graph db we need a mechanism to stream data from >> the >>>>>>>> graph store into flume. >>>>>>> >>>>>>> Yes, I'd say it could potentially make sense to create a Flume Neo4J >>>>>> source >>>>>>> as well. I think the same logic as above would still apply. >>>>>>> >>>>>>> Regards, >>>>>>> Mike >>>>>>> >>>>>>> [1] >> https://flume.apache.org/FlumeUserGuide.html#installing-third-party-plugins >>>>>>> [2] >> https://github.com/apache/incubator-kudu/tree/master/java/kudu-flume-sink >>
