Hey all, great to see this discussion. I'm part of the Flink PMC and would love to see some of Flink's connectors added to Bahir. I can also help Robert with maintenance on the Flink side of things.
+1 to multiple repo approach Best, Ufuk On Tue, Aug 16, 2016 at 2:27 PM, <dev-h...@bahir.apache.org> wrote: > > dev Digest of: thread.362 > > > [DISCUSS] Adding streaming connectors from Apache Flink to Bahir > 362 by: Robert Metzger > 363 by: Steve Loughran > 370 by: Luciano Resende > 371 by: Robert Metzger > 374 by: Luciano Resende > 376 by: Ted Yu > 377 by: Robert Metzger > 380 by: Steve Loughran > 381 by: Luciano Resende > 382 by: Luciano Resende > 384 by: Robert Metzger > > Administrivia: > > > --- Administrative commands for the dev list --- > > I can handle administrative requests automatically. Please > do not send them to the list address! Instead, send > your message to the correct command address: > > To subscribe to the list, send a message to: > <dev-subscr...@bahir.apache.org> > > To remove your address from the list, send a message to: > <dev-unsubscr...@bahir.apache.org> > > Send mail to the following for info and FAQ for this list: > <dev-i...@bahir.apache.org> > <dev-...@bahir.apache.org> > > Similar addresses exist for the digest list: > <dev-digest-subscr...@bahir.apache.org> > <dev-digest-unsubscr...@bahir.apache.org> > > To get messages 123 through 145 (a maximum of 100 per request), mail: > <dev-get.123_...@bahir.apache.org> > > To get an index with subject and author for messages 123-456 , mail: > <dev-index.123_...@bahir.apache.org> > > They are always returned as sets of 100, max 2000 per request, > so you'll actually get 100-499. > > To receive all messages with the same subject as message 12345, > send a short message to: > <dev-thread.12...@bahir.apache.org> > > The messages should contain one line or word of text to avoid being > treated as sp@m, but I will ignore their content. > Only the ADDRESS you send to is important. > > You can start a subscription for an alternate address, > for example "john@host.domain", just add a hyphen and your > address (with '=' instead of '@') after the command word: > <dev-subscribe-john=host.dom...@bahir.apache.org> > > To stop subscription for this address, mail: > <dev-unsubscribe-john=host.dom...@bahir.apache.org> > > In both cases, I'll send a confirmation message to that address. When > you receive it, simply reply to it to complete your subscription. > > If despite following these instructions, you do not get the > desired results, please contact my owner at > dev-ow...@bahir.apache.org. Please be patient, my owner is a > lot slower than I am ;-) > > --- Enclosed is a copy of the request I received. > > Return-Path: <u...@apache.org> > Received: (qmail 73404 invoked by uid 99); 16 Aug 2016 12:27:00 -0000 > Received: from mail-relay.apache.org (HELO mail-relay.apache.org) > (140.211.11.15) > by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Aug 2016 12:27:00 +0000 > Received: from mail-oi0-f46.google.com (mail-oi0-f46.google.com > [209.85.218.46]) > by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) > with ESMTPSA id A606D1A0046 > for <dev-thread....@bahir.apache.org>; Tue, 16 Aug 2016 12:27:00 > +0000 (UTC) > Received: by mail-oi0-f46.google.com with SMTP id c15so96340127oig.0 > for <dev-thread....@bahir.apache.org>; Tue, 16 Aug 2016 05:27:00 > -0700 (PDT) > X-Gm-Message-State: > AEkoousgoDEIM+HCjh+aY7eTsyA74zj2w9Kq4PiayzrgwesoOZ+Zww6zKxamSKZTtf5yGMNL9CuRzh7NJTBzQ8V6 > X-Received: by 10.202.197.3 with SMTP id v3mr18601804oif.131.1471350419968; > Tue, 16 Aug 2016 05:26:59 -0700 (PDT) > MIME-Version: 1.0 > Received: by 10.157.55.181 with HTTP; Tue, 16 Aug 2016 05:26:19 -0700 (PDT) > From: Ufuk Celebi <u...@apache.org> > Date: Tue, 16 Aug 2016 14:26:19 +0200 > X-Gmail-Original-Message-ID: > <CAKiyyaH7h9Njeo+MUaAX2nVoVaHL8B=5stdz+9hv9-udceh...@mail.gmail.com> > Message-ID: > <CAKiyyaH7h9Njeo+MUaAX2nVoVaHL8B=5stdz+9hv9-udceh...@mail.gmail.com> > Subject: > To: dev-thread....@bahir.apache.org > Content-Type: text/plain; charset=UTF-8 > > > ---------------------------------------------------------------------- > > > > ---------- Forwarded message ---------- > From: Robert Metzger <rmetz...@apache.org> > To: dev@bahir.apache.org > Cc: > Date: Thu, 11 Aug 2016 10:54:17 +0200 > Subject: [DISCUSS] Adding streaming connectors from Apache Flink to Bahir > Hello Bahir community, > > The Apache Flink community is currently discussing how to handle incoming > (streaming) connector contributions [1]. > The Flink community wants to limit the maintained connectors to the most > popular ones, but we don't want to reject valuable code contributions > without offering a good alternative. > Among options we are currently discussing is also Apache Bahir. > From the Bahir announcement, I got the impression that the project is also > open to connectors from projects other than Apache Spark. > > Initially, we would move some of our current connectors here (redis, flume, > nifi), and there are also some pending contributions in Flink that we would > redirect to Bahir as well. > > So what's your opinion on this? > > > Regards, > Robert > > > [1] > http://mail-archives.apache.org/mod_mbox/flink-dev/201608.mbox/%3CCAGr9p8CAN8KQTM6%2B3%2B%3DNv8M3ggYEE9gSqdKaKLQiWsWsKzZ21Q%40mail.gmail.com%3E > > > ---------- Forwarded message ---------- > From: Steve Loughran <ste...@apache.org> > To: dev@bahir.apache.org > Cc: > Date: Thu, 11 Aug 2016 11:04:26 +0200 > Subject: Re: [DISCUSS] Adding streaming connectors from Apache Flink to Bahir > I can see benefits from this —provided we get some help from the Flink > people in maintaining and testing the stuff. > > On 11 August 2016 at 10:54, Robert Metzger <rmetz...@apache.org> wrote: > >> Hello Bahir community, >> >> The Apache Flink community is currently discussing how to handle incoming >> (streaming) connector contributions [1]. >> The Flink community wants to limit the maintained connectors to the most >> popular ones, but we don't want to reject valuable code contributions >> without offering a good alternative. >> Among options we are currently discussing is also Apache Bahir. >> From the Bahir announcement, I got the impression that the project is also >> open to connectors from projects other than Apache Spark. >> >> Initially, we would move some of our current connectors here (redis, flume, >> nifi), and there are also some pending contributions in Flink that we would >> redirect to Bahir as well. >> >> So what's your opinion on this? >> >> >> Regards, >> Robert >> >> >> [1] >> http://mail-archives.apache.org/mod_mbox/flink-dev/201608. >> mbox/%3CCAGr9p8CAN8KQTM6%2B3%2B%3DNv8M3ggYEE9gSqdKaKLQiWsWsKzZ >> 21Q%40mail.gmail.com%3E >> > > > ---------- Forwarded message ---------- > From: Luciano Resende <luckbr1...@gmail.com> > To: dev@bahir.apache.org > Cc: > Date: Thu, 11 Aug 2016 04:50:12 -0700 > Subject: Re: [DISCUSS] Adding streaming connectors from Apache Flink to Bahir > On Thu, Aug 11, 2016 at 2:04 AM, Steve Loughran <ste...@apache.org> wrote: > >> I can see benefits from this —provided we get some help from the Flink >> people in maintaining and testing the stuff. >> > > +1, Let me know when you guys are ready and I can create a bahir-flink git > repository. > > > -- > Luciano Resende > http://twitter.com/lresende1975 > http://lresende.blogspot.com/ > > > ---------- Forwarded message ---------- > From: Robert Metzger <rmetz...@apache.org> > To: dev@bahir.apache.org > Cc: > Date: Thu, 11 Aug 2016 14:42:33 +0200 > Subject: Re: [DISCUSS] Adding streaming connectors from Apache Flink to Bahir > @Steve: The plan is that Flink committers also help out here with > reviewing, releasing and other community activities (but I suspect the > activity will be much lower, otherwise, we would not be discussing removing > some of the connectors from Flink) > > @Luciano: So the idea is to have separate repositories for each project > contributing connectors? > I'm wondering if it makes sense to keep the code in the same repository to > have some synergies (like the release scripts, CI, documentation, a common > parent pom with rat etc.). Otherwise, it would maybe make more sense to > create a Bahir-style project for Flink, to avoid maintaining completely > disjunct codebases in the same JIRA, ML, ... > > > On Thu, Aug 11, 2016 at 1:50 PM, Luciano Resende <luckbr1...@gmail.com> > wrote: > >> On Thu, Aug 11, 2016 at 2:04 AM, Steve Loughran <ste...@apache.org> wrote: >> >> > I can see benefits from this —provided we get some help from the Flink >> > people in maintaining and testing the stuff. >> > >> >> +1, Let me know when you guys are ready and I can create a bahir-flink git >> repository. >> >> >> -- >> Luciano Resende >> http://twitter.com/lresende1975 >> http://lresende.blogspot.com/ >> > > > ---------- Forwarded message ---------- > From: Luciano Resende <luckbr1...@gmail.com> > To: dev@bahir.apache.org > Cc: > Date: Thu, 11 Aug 2016 09:03:39 -0700 > Subject: Re: [DISCUSS] Adding streaming connectors from Apache Flink to Bahir > On Thu, Aug 11, 2016 at 5:42 AM, Robert Metzger <rmetz...@apache.org> wrote: > >> >> >> @Luciano: So the idea is to have separate repositories for each project >> contributing connectors? >> I'm wondering if it makes sense to keep the code in the same repository to >> have some synergies (like the release scripts, CI, documentation, a common >> parent pom with rat etc.). Otherwise, it would maybe make more sense to >> create a Bahir-style project for Flink, to avoid maintaining completely >> disjunct codebases in the same JIRA, ML, ... >> >> >> > But we most likely would have very different release schedules with the > different set of extensions, where Spark extensions will tend to follow > Spark release cycles, and Flink release cycles. As for the overhead, I > believe release scripts might be the one piece that would be replicated, > but I can volunteer the infrastructure overhead for now. All rest, such as > JIRA, ML, etc will be common. But, anyway, I don't want to make this an > issue for Flink to bring up the extensions here, so if you have a strong > preference on having all in the same repo, we could start with that. > > Thoughts ? > > > ---------- Forwarded message ---------- > From: Ted Yu <yuzhih...@gmail.com> > To: dev@bahir.apache.org > Cc: > Date: Thu, 11 Aug 2016 09:13:24 -0700 > Subject: Re: [DISCUSS] Adding streaming connectors from Apache Flink to Bahir > Having Flink connectors in the same repo seems to make more sense at the > moment. > > Certain artifacts can be shared between the two types of connectors. > > Flink seems to have more frequent releases recently. But Bahir doesn't have > to follow each Flink patch release. > > Just my two cents. > > On Thu, Aug 11, 2016 at 9:03 AM, Luciano Resende <luckbr1...@gmail.com> > wrote: > >> On Thu, Aug 11, 2016 at 5:42 AM, Robert Metzger <rmetz...@apache.org> >> wrote: >> >> > >> > >> > @Luciano: So the idea is to have separate repositories for each project >> > contributing connectors? >> > I'm wondering if it makes sense to keep the code in the same repository >> to >> > have some synergies (like the release scripts, CI, documentation, a >> common >> > parent pom with rat etc.). Otherwise, it would maybe make more sense to >> > create a Bahir-style project for Flink, to avoid maintaining completely >> > disjunct codebases in the same JIRA, ML, ... >> > >> > >> > >> But we most likely would have very different release schedules with the >> different set of extensions, where Spark extensions will tend to follow >> Spark release cycles, and Flink release cycles. As for the overhead, I >> believe release scripts might be the one piece that would be replicated, >> but I can volunteer the infrastructure overhead for now. All rest, such as >> JIRA, ML, etc will be common. But, anyway, I don't want to make this an >> issue for Flink to bring up the extensions here, so if you have a strong >> preference on having all in the same repo, we could start with that. >> >> Thoughts ? >> > > > ---------- Forwarded message ---------- > From: Robert Metzger <rmetz...@apache.org> > To: dev@bahir.apache.org > Cc: > Date: Thu, 11 Aug 2016 20:41:00 +0200 > Subject: Re: [DISCUSS] Adding streaming connectors from Apache Flink to Bahir > Thank you for your responses. > > @Luciano: I don't have a strong preference for one of the two options, but > I would like to understand the implications of the two before we start > setting up the infrastructure. > Regarding the release cycle: For the Flink connectors, I would actually try > to make the release cycle dependent on the connectors, not so much on Flink > itself. In my experience, connectors could benefit from a more frequent > release schedule. For example Kafka seems to release new versions quite > frequently (recently), or at least the release cycle of Kafka and Flink is > not aligned ;) > So maybe it would make sense for bahir to release independent of the engine > projects, on a monthly or 2-monthly schedule, with an independent > versioning scheme. > > @Ted: Flink has bugfix releases quite frequently, but major releases are at > a okay level (3-4 months in between). > Since 1.0.0 Flink provides interface stability, so there should not be an > issue with independent connector releases. > > > > On Thu, Aug 11, 2016 at 6:13 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Having Flink connectors in the same repo seems to make more sense at the >> moment. >> >> Certain artifacts can be shared between the two types of connectors. >> >> Flink seems to have more frequent releases recently. But Bahir doesn't have >> to follow each Flink patch release. >> >> Just my two cents. >> >> On Thu, Aug 11, 2016 at 9:03 AM, Luciano Resende <luckbr1...@gmail.com> >> wrote: >> >> > On Thu, Aug 11, 2016 at 5:42 AM, Robert Metzger <rmetz...@apache.org> >> > wrote: >> > >> > > >> > > >> > > @Luciano: So the idea is to have separate repositories for each project >> > > contributing connectors? >> > > I'm wondering if it makes sense to keep the code in the same repository >> > to >> > > have some synergies (like the release scripts, CI, documentation, a >> > common >> > > parent pom with rat etc.). Otherwise, it would maybe make more sense to >> > > create a Bahir-style project for Flink, to avoid maintaining completely >> > > disjunct codebases in the same JIRA, ML, ... >> > > >> > > >> > > >> > But we most likely would have very different release schedules with the >> > different set of extensions, where Spark extensions will tend to follow >> > Spark release cycles, and Flink release cycles. As for the overhead, I >> > believe release scripts might be the one piece that would be replicated, >> > but I can volunteer the infrastructure overhead for now. All rest, such >> as >> > JIRA, ML, etc will be common. But, anyway, I don't want to make this an >> > issue for Flink to bring up the extensions here, so if you have a strong >> > preference on having all in the same repo, we could start with that. >> > >> > Thoughts ? >> > >> > > > ---------- Forwarded message ---------- > From: Steve Loughran <ste...@apache.org> > To: dev@bahir.apache.org > Cc: > Date: Thu, 11 Aug 2016 23:18:32 +0200 > Subject: Re: [DISCUSS] Adding streaming connectors from Apache Flink to Bahir > Thinking some more > > To an extent, Bahir is currently mostly a home for some connectors and > things which were orphaned by the main spark team, giving them some ASF > home. Luciano has been putting in lots of work getting a release out in > sync with the spark release. > > I have some plans to contribute some other things related to spark in > there, so again, an ASF home and a test & release process (some YARN driver > plugins, for ATS integration and another I have a plan to write for YARN > registry binding). Again, some stuff unloved by the core spark team. > > Ideally, Flink should be growing its user/dev base, recruiting everyone who > wants to get patches in and getting them to work on those JIRAs. That's the > community growth part of an ASF project. Having some orphan stuff isn't > ideal; it's the perennial "contrib" problem of projects.(*) > > Hadoop had a big purge of contrib stuff in the move to hadoop 2 & maven, > though we've been adding stuff in hadoop-tools, especially related to > object stores and things. There's now a fairly harsh-but-needed policy > there: no contributions which can't be tested during a release. It's a PITA > as for some code changes I need to test against: AWS S3, Azure, 2x > OpenStack endpoints and soon a chinese one. We could have been harsh and > said "stay on github" but having it in offers some benefits > -synchronized release schedule (good for Hadoop; bad if the contributors > want to release more frequently) > -hadoop team gets some control over what's going on there. > -code review process lets us improve quality; we're getting metrics in &c. > -works well with my plan to have an explicit object store API, extending > FileSystem with specific and efficient blobstore ops (put(), > list(prefix),..) > -enables us to do refactorings across all object stores > > One thing we do have there which handles object stores/filesystems even > outside Hadoop is a set of public compliance tests and a fairly strict > specification of what a filesystem is meant to do; it means we can handle a > big contrib by getting the authors to have those tests working, have > regression tests going. But...the bindings do need active engagement to > keep alive; openstack has suffered a bit there, and there's now some fork > in openstack itself: code follows maintenance; use drives maintenance. > > Anyway, I digress > > I've thought about this some more and here are some points > > -if there's mutual code and/or tests related to flink connectors and the > spark ones, there's a very strong case for putting the code into bahir > -if it's more that you need a home for things, I'd recommend you start with > Apache Flink and if there are big contributions that suffer neglect then > it'll be time to look for a home > > in the meantime, maybe bahir artifacts should explicitly indicate that they > are for spark, eg bahir-spark, so as to leave the option for having, say, a > bahir-flink artifact at some point in the future. > > > > > On 11 August 2016 at 14:42, Robert Metzger <rmetz...@apache.org> wrote: > >> @Steve: The plan is that Flink committers also help out here with >> reviewing, releasing and other community activities (but I suspect the >> activity will be much lower, otherwise, we would not be discussing removing >> some of the connectors from Flink) >> >> @Luciano: So the idea is to have separate repositories for each project >> contributing connectors? >> I'm wondering if it makes sense to keep the code in the same repository to >> have some synergies (like the release scripts, CI, documentation, a common >> parent pom with rat etc.). Otherwise, it would maybe make more sense to >> create a Bahir-style project for Flink, to avoid maintaining completely >> disjunct codebases in the same JIRA, ML, ... >> >> >> On Thu, Aug 11, 2016 at 1:50 PM, Luciano Resende <luckbr1...@gmail.com> >> wrote: >> >> > On Thu, Aug 11, 2016 at 2:04 AM, Steve Loughran <ste...@apache.org> >> wrote: >> > >> > > I can see benefits from this —provided we get some help from the Flink >> > > people in maintaining and testing the stuff. >> > > >> > >> > +1, Let me know when you guys are ready and I can create a bahir-flink >> git >> > repository. >> > >> > >> > -- >> > Luciano Resende >> > http://twitter.com/lresende1975 >> > http://lresende.blogspot.com/ >> > >> > > > ---------- Forwarded message ---------- > From: Luciano Resende <luckbr1...@gmail.com> > To: dev@bahir.apache.org > Cc: > Date: Fri, 12 Aug 2016 11:28:36 -0700 > Subject: Re: [DISCUSS] Adding streaming connectors from Apache Flink to Bahir > On Thu, Aug 11, 2016 at 2:18 PM, Steve Loughran <ste...@apache.org> wrote: > >> Thinking some more >> >> To an extent, Bahir is currently mostly a home for some connectors and >> things which were orphaned by the main spark team, giving them some ASF >> home. Luciano has been putting in lots of work getting a release out in >> sync with the spark release. >> > > This was what originated Bahir, but we are already starting to see original > extensions being built by the Bahir community. > What we see today is a few distributed analytic platforms that have their > focus on build the runtime and maybe a few reference implementation > extensions, and then extensions are mostly built by individuals in their > own github repositories. Bahir enables these extensions to build a > community around it and follow the Apache governance, and it's open for non > Spark extensions. > > >> >> I have some plans to contribute some other things related to spark in >> there, so again, an ASF home and a test & release process (some YARN driver >> plugins, for ATS integration and another I have a plan to write for YARN >> registry binding). Again, some stuff unloved by the core spark team. >> >> Ideally, Flink should be growing its user/dev base, recruiting everyone who >> wants to get patches in and getting them to work on those JIRAs. That's the >> community growth part of an ASF project. Having some orphan stuff isn't >> ideal; it's the perennial "contrib" problem of projects.(*) >> >> > I don't think that collaborating around Flink extensions in Bahir implies > that these extensions are orphans. Bahir can give a lot of flexibility to > these extensions, one is release flexibility, where the extensions could > follow the extension source release cycle (e.g. Kafka release cycle) or the > Platform release cycle (e.g. Flink) or both, which is more complicated when > they are collocated within the Platform code. Another benefit is the share > of domain expertise, Kafka experts for example could collaborate across > extensions on different platforms, etc... > > >> Hadoop had a big purge of contrib stuff in the move to hadoop 2 & maven, >> though we've been adding stuff in hadoop-tools, especially related to >> object stores and things. There's now a fairly harsh-but-needed policy >> there: no contributions which can't be tested during a release. It's a PITA >> as for some code changes I need to test against: AWS S3, Azure, 2x >> OpenStack endpoints and soon a chinese one. We could have been harsh and >> said "stay on github" but having it in offers some benefits >> -synchronized release schedule (good for Hadoop; bad if the contributors >> want to release more frequently) >> -hadoop team gets some control over what's going on there. >> -code review process lets us improve quality; we're getting metrics in &c. >> -works well with my plan to have an explicit object store API, extending >> FileSystem with specific and efficient blobstore ops (put(), >> list(prefix),..) >> -enables us to do refactorings across all object stores >> >> One thing we do have there which handles object stores/filesystems even >> outside Hadoop is a set of public compliance tests and a fairly strict >> specification of what a filesystem is meant to do; it means we can handle a >> big contrib by getting the authors to have those tests working, have >> regression tests going. But...the bindings do need active engagement to >> keep alive; openstack has suffered a bit there, and there's now some fork >> in openstack itself: code follows maintenance; use drives maintenance. >> >> Anyway, I digress >> >> I've thought about this some more and here are some points >> >> -if there's mutual code and/or tests related to flink connectors and the >> spark ones, there's a very strong case for putting the code into bahir >> > > IMHO, even if there isn't, I believe there is still benefits, some of I > have described above. > > >> -if it's more that you need a home for things, I'd recommend you start with >> Apache Flink and if there are big contributions that suffer neglect then >> it'll be time to look for a home >> >> > Well, I would say, if you need a more flexible place to host these > extensions, Bahir would welcome you. > > Having said that, we are expecting that the Flink community would be > responsible for maintaining these extensions with help of the Bahir > community. Note that we also have an defined some guidelines for retiring > extensions : http://bahir.apache.org/contributing-extensions/ which will be > used in case of orphaned code. > > >> in the meantime, maybe bahir artifacts should explicitly indicate that they >> are for spark, eg bahir-spark, so as to leave the option for having, say, a >> bahir-flink artifact at some point in the future. >> > > Currently, all artifact ids are prefixed by spark: > <artifactId>spark-streaming-akka_2.11</artifactId> > > > >> >> >> >> On 11 August 2016 at 14:42, Robert Metzger <rmetz...@apache.org> wrote: >> >> > @Steve: The plan is that Flink committers also help out here with >> > reviewing, releasing and other community activities (but I suspect the >> > activity will be much lower, otherwise, we would not be discussing >> removing >> > some of the connectors from Flink) >> > >> > @Luciano: So the idea is to have separate repositories for each project >> > contributing connectors? >> > I'm wondering if it makes sense to keep the code in the same repository >> to >> > have some synergies (like the release scripts, CI, documentation, a >> common >> > parent pom with rat etc.). Otherwise, it would maybe make more sense to >> > create a Bahir-style project for Flink, to avoid maintaining completely >> > disjunct codebases in the same JIRA, ML, ... >> > >> > >> > On Thu, Aug 11, 2016 at 1:50 PM, Luciano Resende <luckbr1...@gmail.com> >> > wrote: >> > >> > > On Thu, Aug 11, 2016 at 2:04 AM, Steve Loughran <ste...@apache.org> >> > wrote: >> > > >> > > > I can see benefits from this —provided we get some help from the >> Flink >> > > > people in maintaining and testing the stuff. >> > > > >> > > >> > > +1, Let me know when you guys are ready and I can create a bahir-flink >> > git >> > > repository. >> > > >> > > >> > > -- >> > > Luciano Resende >> > > http://twitter.com/lresende1975 >> > > http://lresende.blogspot.com/ >> > > >> > >> > > > > -- > Luciano Resende > http://twitter.com/lresende1975 > http://lresende.blogspot.com/ > > > ---------- Forwarded message ---------- > From: Luciano Resende <luckbr1...@gmail.com> > To: dev@bahir.apache.org > Cc: > Date: Fri, 12 Aug 2016 11:34:25 -0700 > Subject: Re: [DISCUSS] Adding streaming connectors from Apache Flink to Bahir > On Thu, Aug 11, 2016 at 9:03 AM, Luciano Resende <luckbr1...@gmail.com> > wrote: > >> >> >> On Thu, Aug 11, 2016 at 5:42 AM, Robert Metzger <rmetz...@apache.org> >> wrote: >> >>> >>> >>> @Luciano: So the idea is to have separate repositories for each project >>> contributing connectors? >>> I'm wondering if it makes sense to keep the code in the same repository to >>> have some synergies (like the release scripts, CI, documentation, a common >>> parent pom with rat etc.). Otherwise, it would maybe make more sense to >>> create a Bahir-style project for Flink, to avoid maintaining completely >>> disjunct codebases in the same JIRA, ML, ... >>> >>> >>> >> But we most likely would have very different release schedules with the >> different set of extensions, where Spark extensions will tend to follow >> Spark release cycles, and Flink release cycles. As for the overhead, I >> believe release scripts might be the one piece that would be replicated, >> but I can volunteer the infrastructure overhead for now. All rest, such as >> JIRA, ML, etc will be common. But, anyway, I don't want to make this an >> issue for Flink to bring up the extensions here, so if you have a strong >> preference on having all in the same repo, we could start with that. >> >> Thoughts ? >> >> > I have thought more about the question about one combined repository versus > separate repositories per platform (e.g. Spark, Flink) and the more I think > I believe two repositories will be the best. Think about some of the > benefits listed below : > > Multiple Repositories: > - Enable smaller and fast builds, as you don't have to wait on the other > platform extensions > - Simplify dependency management when different platforms use different > levels of dependencies > - Enable for more flexibility on releases, permitting disruptive changes in > one platform without affecting others > - Enable better versioning schema for different platforms (e.g. Spark > following the Spark release version schema, while Flink having it's own > schema) > - etc > > One Repository > - Enable sharing common components (which in my view will be mostly > infrastructure pieces that once created are somewhat stable) > > Thoughts ? > > -- > Luciano Resende > http://twitter.com/lresende1975 > http://lresende.blogspot.com/ > > > ---------- Forwarded message ---------- > From: Robert Metzger <rmetz...@apache.org> > To: dev@bahir.apache.org > Cc: > Date: Mon, 15 Aug 2016 14:04:09 +0200 > Subject: Re: [DISCUSS] Adding streaming connectors from Apache Flink to Bahir > Hi, > > @stevel: Flink is still experiencing a lot of community growth. Initially, > we accepted all contributions in an acceptable state. Then, we introduced > various models of "staging" and "contrib" modules, but by now, the amount > of incoming contributions is just too high for the core project. > Also, its a bit out of scope compared to the core engine we are building. > That's why we started looking at Bahir (and other approaches) > > @Luciano, I'll answer to the multiple vs one repo discussion inline below > > > On Fri, Aug 12, 2016 at 8:34 PM, Luciano Resende <luckbr1...@gmail.com> > wrote: > >> On Thu, Aug 11, 2016 at 9:03 AM, Luciano Resende <luckbr1...@gmail.com> >> wrote: >> >> Multiple Repositories: >> - Enable smaller and fast builds, as you don't have to wait on the other >> platform extensions >> > > True, build time is an argument for multiple repos > > >> - Simplify dependency management when different platforms use different >> levels of dependencies >> > > I don't think that the dependencies influence each other much. > For the one repository approach, the structure would probably be like this: > > bahir-parent > - bahir-spark > - spark-streaming-akka > - ... > - bahir-flink > - flink-connector-redis > - ... > > In "bahir-parent", we could define all release-related plugins, apache rat, > checkstyle?, general project information and all the other stuff that makes > a bahir project "bahir" ;) > In the "bahir-<system>" parent, we could define all platform specific > dependencies and settings. > > > >> - Enable for more flexibility on releases, permitting disruptive changes in >> one platform without affecting others >> > > With the structure proposed above, I guess we could actually have an > independent versioning / releasing for the "bahir-<system>" parent tree. > > >> - Enable better versioning schema for different platforms (e.g. Spark >> following the Spark release version schema, while Flink having it's own >> schema) >> - etc >> >> One Repository >> - Enable sharing common components (which in my view will be mostly >> infrastructure pieces that once created are somewhat stable) >> >> > > Since you are the project PMC chair, I propose to go for the "multiple > repositories" approach if nobody objects within 24 hours? > > Once we have concluded our discussion here, I'll send a summary to the > Flink dev@ list and see what they think about it. > I expect them to agree to our proposals, since the "bahir approach" is our > favorite. > > Regards, > Robert >