Re: [DISCUSS] Adding streaming connectors from Apache Flink to Bahir

Ufuk Celebi Tue, 16 Aug 2016 05:31:50 -0700

Hey all,

great to see this discussion. I'm part of the Flink PMC and would love
to see some of Flink's connectors added to Bahir. I can also help
Robert with maintenance on the Flink side of things.


+1 to multiple repo approach

Best,

Ufuk

On Tue, Aug 16, 2016 at 2:27 PM,  <dev-h...@bahir.apache.org> wrote:
>
> dev Digest of: thread.362
>
>
> [DISCUSS] Adding streaming connectors from Apache Flink to Bahir
>         362 by: Robert Metzger
>         363 by: Steve Loughran
>         370 by: Luciano Resende
>         371 by: Robert Metzger
>         374 by: Luciano Resende
>         376 by: Ted Yu
>         377 by: Robert Metzger
>         380 by: Steve Loughran
>         381 by: Luciano Resende
>         382 by: Luciano Resende
>         384 by: Robert Metzger
>
> Administrivia:
>
>
> --- Administrative commands for the dev list ---
>
> I can handle administrative requests automatically. Please
> do not send them to the list address! Instead, send
> your message to the correct command address:
>
> To subscribe to the list, send a message to:
>    <dev-subscr...@bahir.apache.org>
>
> To remove your address from the list, send a message to:
>    <dev-unsubscr...@bahir.apache.org>
>
> Send mail to the following for info and FAQ for this list:
>    <dev-i...@bahir.apache.org>
>    <dev-...@bahir.apache.org>
>
> Similar addresses exist for the digest list:
>    <dev-digest-subscr...@bahir.apache.org>
>    <dev-digest-unsubscr...@bahir.apache.org>
>
> To get messages 123 through 145 (a maximum of 100 per request), mail:
>    <dev-get.123_...@bahir.apache.org>
>
> To get an index with subject and author for messages 123-456 , mail:
>    <dev-index.123_...@bahir.apache.org>
>
> They are always returned as sets of 100, max 2000 per request,
> so you'll actually get 100-499.
>
> To receive all messages with the same subject as message 12345,
> send a short message to:
>    <dev-thread.12...@bahir.apache.org>
>
> The messages should contain one line or word of text to avoid being
> treated as sp@m, but I will ignore their content.
> Only the ADDRESS you send to is important.
>
> You can start a subscription for an alternate address,
> for example "john@host.domain", just add a hyphen and your
> address (with '=' instead of '@') after the command word:
> <dev-subscribe-john=host.dom...@bahir.apache.org>
>
> To stop subscription for this address, mail:
> <dev-unsubscribe-john=host.dom...@bahir.apache.org>
>
> In both cases, I'll send a confirmation message to that address. When
> you receive it, simply reply to it to complete your subscription.
>
> If despite following these instructions, you do not get the
> desired results, please contact my owner at
> dev-ow...@bahir.apache.org. Please be patient, my owner is a
> lot slower than I am ;-)
>
> --- Enclosed is a copy of the request I received.
>
> Return-Path: <u...@apache.org>
> Received: (qmail 73404 invoked by uid 99); 16 Aug 2016 12:27:00 -0000
> Received: from mail-relay.apache.org (HELO mail-relay.apache.org) 
> (140.211.11.15)
>     by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Aug 2016 12:27:00 +0000
> Received: from mail-oi0-f46.google.com (mail-oi0-f46.google.com 
> [209.85.218.46])
>         by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) 
> with ESMTPSA id A606D1A0046
>         for <dev-thread....@bahir.apache.org>; Tue, 16 Aug 2016 12:27:00 
> +0000 (UTC)
> Received: by mail-oi0-f46.google.com with SMTP id c15so96340127oig.0
>         for <dev-thread....@bahir.apache.org>; Tue, 16 Aug 2016 05:27:00 
> -0700 (PDT)
> X-Gm-Message-State: 
> AEkoousgoDEIM+HCjh+aY7eTsyA74zj2w9Kq4PiayzrgwesoOZ+Zww6zKxamSKZTtf5yGMNL9CuRzh7NJTBzQ8V6
> X-Received: by 10.202.197.3 with SMTP id v3mr18601804oif.131.1471350419968;
>  Tue, 16 Aug 2016 05:26:59 -0700 (PDT)
> MIME-Version: 1.0
> Received: by 10.157.55.181 with HTTP; Tue, 16 Aug 2016 05:26:19 -0700 (PDT)
> From: Ufuk Celebi <u...@apache.org>
> Date: Tue, 16 Aug 2016 14:26:19 +0200
> X-Gmail-Original-Message-ID: 
> <CAKiyyaH7h9Njeo+MUaAX2nVoVaHL8B=5stdz+9hv9-udceh...@mail.gmail.com>
> Message-ID: 
> <CAKiyyaH7h9Njeo+MUaAX2nVoVaHL8B=5stdz+9hv9-udceh...@mail.gmail.com>
> Subject:
> To: dev-thread....@bahir.apache.org
> Content-Type: text/plain; charset=UTF-8
>
>
> ----------------------------------------------------------------------
>
>
>
> ---------- Forwarded message ----------
> From: Robert Metzger <rmetz...@apache.org>
> To: dev@bahir.apache.org
> Cc:
> Date: Thu, 11 Aug 2016 10:54:17 +0200
> Subject: [DISCUSS] Adding streaming connectors from Apache Flink to Bahir
> Hello Bahir community,
>
> The Apache Flink community is currently discussing how to handle incoming
> (streaming) connector contributions [1].
> The Flink community wants to limit the maintained connectors to the most
> popular ones, but we don't want to reject valuable code contributions
> without offering a good alternative.
> Among options we are currently discussing is also Apache Bahir.
> From the Bahir announcement, I got the impression that the project is also
> open to connectors from projects other than Apache Spark.
>
> Initially, we would move some of our current connectors here (redis, flume,
> nifi), and there are also some pending contributions in Flink that we would
> redirect to Bahir as well.
>
> So what's your opinion on this?
>
>
> Regards,
> Robert
>
>
> [1]
> http://mail-archives.apache.org/mod_mbox/flink-dev/201608.mbox/%3CCAGr9p8CAN8KQTM6%2B3%2B%3DNv8M3ggYEE9gSqdKaKLQiWsWsKzZ21Q%40mail.gmail.com%3E
>
>
> ---------- Forwarded message ----------
> From: Steve Loughran <ste...@apache.org>
> To: dev@bahir.apache.org
> Cc:
> Date: Thu, 11 Aug 2016 11:04:26 +0200
> Subject: Re: [DISCUSS] Adding streaming connectors from Apache Flink to Bahir
> I can see benefits from this —provided we get some help from the Flink
> people in maintaining and testing the stuff.
>
> On 11 August 2016 at 10:54, Robert Metzger <rmetz...@apache.org> wrote:
>
>> Hello Bahir community,
>>
>> The Apache Flink community is currently discussing how to handle incoming
>> (streaming) connector contributions [1].
>> The Flink community wants to limit the maintained connectors to the most
>> popular ones, but we don't want to reject valuable code contributions
>> without offering a good alternative.
>> Among options we are currently discussing is also Apache Bahir.
>> From the Bahir announcement, I got the impression that the project is also
>> open to connectors from projects other than Apache Spark.
>>
>> Initially, we would move some of our current connectors here (redis, flume,
>> nifi), and there are also some pending contributions in Flink that we would
>> redirect to Bahir as well.
>>
>> So what's your opinion on this?
>>
>>
>> Regards,
>> Robert
>>
>>
>> [1]
>> http://mail-archives.apache.org/mod_mbox/flink-dev/201608.
>> mbox/%3CCAGr9p8CAN8KQTM6%2B3%2B%3DNv8M3ggYEE9gSqdKaKLQiWsWsKzZ
>> 21Q%40mail.gmail.com%3E
>>
>
>
> ---------- Forwarded message ----------
> From: Luciano Resende <luckbr1...@gmail.com>
> To: dev@bahir.apache.org
> Cc:
> Date: Thu, 11 Aug 2016 04:50:12 -0700
> Subject: Re: [DISCUSS] Adding streaming connectors from Apache Flink to Bahir
> On Thu, Aug 11, 2016 at 2:04 AM, Steve Loughran <ste...@apache.org> wrote:
>
>> I can see benefits from this —provided we get some help from the Flink
>> people in maintaining and testing the stuff.
>>
>
> +1, Let me know when you guys are ready and I can create a bahir-flink git
> repository.
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>
>
> ---------- Forwarded message ----------
> From: Robert Metzger <rmetz...@apache.org>
> To: dev@bahir.apache.org
> Cc:
> Date: Thu, 11 Aug 2016 14:42:33 +0200
> Subject: Re: [DISCUSS] Adding streaming connectors from Apache Flink to Bahir
> @Steve: The plan is that Flink committers also help out here with
> reviewing, releasing and other community activities (but I suspect the
> activity will be much lower, otherwise, we would not be discussing removing
> some of the connectors from Flink)
>
> @Luciano: So the idea is to have separate repositories for each project
> contributing connectors?
> I'm wondering if it makes sense to keep the code in the same repository to
> have some synergies (like the release scripts, CI, documentation, a common
> parent pom with rat etc.). Otherwise, it would maybe make more sense to
> create a Bahir-style project for Flink, to avoid maintaining completely
> disjunct codebases in the same JIRA, ML, ...
>
>
> On Thu, Aug 11, 2016 at 1:50 PM, Luciano Resende <luckbr1...@gmail.com>
> wrote:
>
>> On Thu, Aug 11, 2016 at 2:04 AM, Steve Loughran <ste...@apache.org> wrote:
>>
>> > I can see benefits from this —provided we get some help from the Flink
>> > people in maintaining and testing the stuff.
>> >
>>
>> +1, Let me know when you guys are ready and I can create a bahir-flink git
>> repository.
>>
>>
>> --
>> Luciano Resende
>> http://twitter.com/lresende1975
>> http://lresende.blogspot.com/
>>
>
>
> ---------- Forwarded message ----------
> From: Luciano Resende <luckbr1...@gmail.com>
> To: dev@bahir.apache.org
> Cc:
> Date: Thu, 11 Aug 2016 09:03:39 -0700
> Subject: Re: [DISCUSS] Adding streaming connectors from Apache Flink to Bahir
> On Thu, Aug 11, 2016 at 5:42 AM, Robert Metzger <rmetz...@apache.org> wrote:
>
>>
>>
>> @Luciano: So the idea is to have separate repositories for each project
>> contributing connectors?
>> I'm wondering if it makes sense to keep the code in the same repository to
>> have some synergies (like the release scripts, CI, documentation, a common
>> parent pom with rat etc.). Otherwise, it would maybe make more sense to
>> create a Bahir-style project for Flink, to avoid maintaining completely
>> disjunct codebases in the same JIRA, ML, ...
>>
>>
>>
> But we most likely would have very different release schedules with the
> different set of extensions, where Spark extensions will tend to follow
> Spark release cycles, and Flink release cycles. As for the overhead, I
> believe release scripts might be the one piece that would be replicated,
> but I can volunteer the infrastructure overhead for now. All rest, such as
> JIRA, ML, etc will be common. But, anyway, I don't want to make this an
> issue for Flink to bring up the extensions here, so if you have a strong
> preference on having all in the same repo, we could start with that.
>
> Thoughts ?
>
>
> ---------- Forwarded message ----------
> From: Ted Yu <yuzhih...@gmail.com>
> To: dev@bahir.apache.org
> Cc:
> Date: Thu, 11 Aug 2016 09:13:24 -0700
> Subject: Re: [DISCUSS] Adding streaming connectors from Apache Flink to Bahir
> Having Flink connectors in the same repo seems to make more sense at the
> moment.
>
> Certain artifacts can be shared between the two types of connectors.
>
> Flink seems to have more frequent releases recently. But Bahir doesn't have
> to follow each Flink patch release.
>
> Just my two cents.
>
> On Thu, Aug 11, 2016 at 9:03 AM, Luciano Resende <luckbr1...@gmail.com>
> wrote:
>
>> On Thu, Aug 11, 2016 at 5:42 AM, Robert Metzger <rmetz...@apache.org>
>> wrote:
>>
>> >
>> >
>> > @Luciano: So the idea is to have separate repositories for each project
>> > contributing connectors?
>> > I'm wondering if it makes sense to keep the code in the same repository
>> to
>> > have some synergies (like the release scripts, CI, documentation, a
>> common
>> > parent pom with rat etc.). Otherwise, it would maybe make more sense to
>> > create a Bahir-style project for Flink, to avoid maintaining completely
>> > disjunct codebases in the same JIRA, ML, ...
>> >
>> >
>> >
>> But we most likely would have very different release schedules with the
>> different set of extensions, where Spark extensions will tend to follow
>> Spark release cycles, and Flink release cycles. As for the overhead, I
>> believe release scripts might be the one piece that would be replicated,
>> but I can volunteer the infrastructure overhead for now. All rest, such as
>> JIRA, ML, etc will be common. But, anyway, I don't want to make this an
>> issue for Flink to bring up the extensions here, so if you have a strong
>> preference on having all in the same repo, we could start with that.
>>
>> Thoughts ?
>>
>
>
> ---------- Forwarded message ----------
> From: Robert Metzger <rmetz...@apache.org>
> To: dev@bahir.apache.org
> Cc:
> Date: Thu, 11 Aug 2016 20:41:00 +0200
> Subject: Re: [DISCUSS] Adding streaming connectors from Apache Flink to Bahir
> Thank you for your responses.
>
> @Luciano: I don't have a strong preference for one of the two options, but
> I would like to understand the implications of the two before we start
> setting up the infrastructure.
> Regarding the release cycle: For the Flink connectors, I would actually try
> to make the release cycle dependent on the connectors, not so much on Flink
> itself. In my experience, connectors could benefit from a more frequent
> release schedule. For example Kafka seems to release new versions quite
> frequently (recently), or at least the release cycle of Kafka and Flink is
> not aligned ;)
> So maybe it would make sense for bahir to release independent of the engine
> projects, on a monthly or 2-monthly schedule, with an independent
> versioning scheme.
>
> @Ted: Flink has bugfix releases quite frequently, but major releases are at
> a okay level (3-4 months in between).
> Since 1.0.0 Flink provides interface stability, so there should not be an
> issue with independent connector releases.
>
>
>
> On Thu, Aug 11, 2016 at 6:13 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> Having Flink connectors in the same repo seems to make more sense at the
>> moment.
>>
>> Certain artifacts can be shared between the two types of connectors.
>>
>> Flink seems to have more frequent releases recently. But Bahir doesn't have
>> to follow each Flink patch release.
>>
>> Just my two cents.
>>
>> On Thu, Aug 11, 2016 at 9:03 AM, Luciano Resende <luckbr1...@gmail.com>
>> wrote:
>>
>> > On Thu, Aug 11, 2016 at 5:42 AM, Robert Metzger <rmetz...@apache.org>
>> > wrote:
>> >
>> > >
>> > >
>> > > @Luciano: So the idea is to have separate repositories for each project
>> > > contributing connectors?
>> > > I'm wondering if it makes sense to keep the code in the same repository
>> > to
>> > > have some synergies (like the release scripts, CI, documentation, a
>> > common
>> > > parent pom with rat etc.). Otherwise, it would maybe make more sense to
>> > > create a Bahir-style project for Flink, to avoid maintaining completely
>> > > disjunct codebases in the same JIRA, ML, ...
>> > >
>> > >
>> > >
>> > But we most likely would have very different release schedules with the
>> > different set of extensions, where Spark extensions will tend to follow
>> > Spark release cycles, and Flink release cycles. As for the overhead, I
>> > believe release scripts might be the one piece that would be replicated,
>> > but I can volunteer the infrastructure overhead for now. All rest, such
>> as
>> > JIRA, ML, etc will be common. But, anyway, I don't want to make this an
>> > issue for Flink to bring up the extensions here, so if you have a strong
>> > preference on having all in the same repo, we could start with that.
>> >
>> > Thoughts ?
>> >
>>
>
>
> ---------- Forwarded message ----------
> From: Steve Loughran <ste...@apache.org>
> To: dev@bahir.apache.org
> Cc:
> Date: Thu, 11 Aug 2016 23:18:32 +0200
> Subject: Re: [DISCUSS] Adding streaming connectors from Apache Flink to Bahir
> Thinking some more
>
> To an extent, Bahir is currently mostly a home for some connectors and
> things which were orphaned by the main spark team, giving them some ASF
> home. Luciano has been putting in lots of work getting a release out in
> sync with the spark release.
>
> I have some plans to contribute some other things related to spark in
> there, so again, an ASF home and a test & release process (some YARN driver
> plugins, for ATS integration and another I have a plan to write for YARN
> registry binding). Again, some stuff unloved by the core spark team.
>
> Ideally, Flink should be growing its user/dev base, recruiting everyone who
> wants to get patches in and getting them to work on those JIRAs. That's the
> community growth part of an ASF project. Having some orphan stuff isn't
> ideal; it's the perennial "contrib" problem of projects.(*)
>
> Hadoop had a big purge of contrib stuff in the move to hadoop 2 & maven,
> though we've been adding stuff in hadoop-tools, especially related to
> object stores and things. There's now a fairly harsh-but-needed policy
> there: no contributions which can't be tested during a release. It's a PITA
> as for some code changes I need to test against: AWS S3, Azure, 2x
> OpenStack endpoints and soon a chinese one. We could have been harsh and
> said "stay on github" but having it in offers some benefits
>  -synchronized release schedule (good for Hadoop; bad if the contributors
> want to release more frequently)
>  -hadoop team gets some control over what's going on there.
>  -code review process lets us improve quality; we're getting metrics in &c.
>  -works well with my plan to have an explicit object store API, extending
> FileSystem with specific and efficient blobstore ops (put(),
> list(prefix),..)
>  -enables us to do refactorings across all object stores
>
> One thing we do have there which handles object stores/filesystems even
> outside Hadoop is a set of public compliance tests and a fairly strict
> specification of what a filesystem is meant to do; it means we can handle a
> big contrib by getting the authors to have those tests working, have
> regression tests going. But...the bindings do need active engagement to
> keep alive; openstack has suffered a bit there, and there's now some fork
> in openstack itself: code follows maintenance; use drives maintenance.
>
> Anyway, I digress
>
> I've thought about this some more and here are some points
>
> -if there's mutual code and/or tests related to flink connectors and the
> spark ones, there's a very strong case for putting the code into bahir
> -if it's more that you need a home for things, I'd recommend you start with
> Apache Flink and if there are big contributions that suffer neglect then
> it'll be time to look for a home
>
> in the meantime, maybe bahir artifacts should explicitly indicate that they
> are for spark, eg bahir-spark, so as to leave the option for having, say, a
> bahir-flink artifact at some point in the future.
>
>
>
>
> On 11 August 2016 at 14:42, Robert Metzger <rmetz...@apache.org> wrote:
>
>> @Steve: The plan is that Flink committers also help out here with
>> reviewing, releasing and other community activities (but I suspect the
>> activity will be much lower, otherwise, we would not be discussing removing
>> some of the connectors from Flink)
>>
>> @Luciano: So the idea is to have separate repositories for each project
>> contributing connectors?
>> I'm wondering if it makes sense to keep the code in the same repository to
>> have some synergies (like the release scripts, CI, documentation, a common
>> parent pom with rat etc.). Otherwise, it would maybe make more sense to
>> create a Bahir-style project for Flink, to avoid maintaining completely
>> disjunct codebases in the same JIRA, ML, ...
>>
>>
>> On Thu, Aug 11, 2016 at 1:50 PM, Luciano Resende <luckbr1...@gmail.com>
>> wrote:
>>
>> > On Thu, Aug 11, 2016 at 2:04 AM, Steve Loughran <ste...@apache.org>
>> wrote:
>> >
>> > > I can see benefits from this —provided we get some help from the Flink
>> > > people in maintaining and testing the stuff.
>> > >
>> >
>> > +1, Let me know when you guys are ready and I can create a bahir-flink
>> git
>> > repository.
>> >
>> >
>> > --
>> > Luciano Resende
>> > http://twitter.com/lresende1975
>> > http://lresende.blogspot.com/
>> >
>>
>
>
> ---------- Forwarded message ----------
> From: Luciano Resende <luckbr1...@gmail.com>
> To: dev@bahir.apache.org
> Cc:
> Date: Fri, 12 Aug 2016 11:28:36 -0700
> Subject: Re: [DISCUSS] Adding streaming connectors from Apache Flink to Bahir
> On Thu, Aug 11, 2016 at 2:18 PM, Steve Loughran <ste...@apache.org> wrote:
>
>> Thinking some more
>>
>> To an extent, Bahir is currently mostly a home for some connectors and
>> things which were orphaned by the main spark team, giving them some ASF
>> home. Luciano has been putting in lots of work getting a release out in
>> sync with the spark release.
>>
>
> This was what originated Bahir, but we are already starting to see original
> extensions being built by the Bahir community.
> What we see today is a few distributed analytic platforms that have their
> focus on build the runtime and maybe a few reference implementation
> extensions, and then extensions are mostly built by individuals in their
> own github repositories. Bahir enables these extensions to build a
> community around it and follow the Apache governance, and it's open for non
> Spark extensions.
>
>
>>
>> I have some plans to contribute some other things related to spark in
>> there, so again, an ASF home and a test & release process (some YARN driver
>> plugins, for ATS integration and another I have a plan to write for YARN
>> registry binding). Again, some stuff unloved by the core spark team.
>>
>> Ideally, Flink should be growing its user/dev base, recruiting everyone who
>> wants to get patches in and getting them to work on those JIRAs. That's the
>> community growth part of an ASF project. Having some orphan stuff isn't
>> ideal; it's the perennial "contrib" problem of projects.(*)
>>
>>
> I don't think that collaborating around Flink extensions in Bahir implies
> that these extensions are orphans. Bahir can give a lot of flexibility to
> these extensions, one is release flexibility, where the extensions could
> follow the extension source release cycle (e.g. Kafka release cycle) or the
> Platform release cycle (e.g. Flink) or both, which is more complicated when
> they are collocated within the Platform code. Another benefit is the share
> of domain expertise, Kafka experts for example could collaborate across
> extensions on different platforms, etc...
>
>
>> Hadoop had a big purge of contrib stuff in the move to hadoop 2 & maven,
>> though we've been adding stuff in hadoop-tools, especially related to
>> object stores and things. There's now a fairly harsh-but-needed policy
>> there: no contributions which can't be tested during a release. It's a PITA
>> as for some code changes I need to test against: AWS S3, Azure, 2x
>> OpenStack endpoints and soon a chinese one. We could have been harsh and
>> said "stay on github" but having it in offers some benefits
>>  -synchronized release schedule (good for Hadoop; bad if the contributors
>> want to release more frequently)
>>  -hadoop team gets some control over what's going on there.
>>  -code review process lets us improve quality; we're getting metrics in &c.
>>  -works well with my plan to have an explicit object store API, extending
>> FileSystem with specific and efficient blobstore ops (put(),
>> list(prefix),..)
>>  -enables us to do refactorings across all object stores
>>
>> One thing we do have there which handles object stores/filesystems even
>> outside Hadoop is a set of public compliance tests and a fairly strict
>> specification of what a filesystem is meant to do; it means we can handle a
>> big contrib by getting the authors to have those tests working, have
>> regression tests going. But...the bindings do need active engagement to
>> keep alive; openstack has suffered a bit there, and there's now some fork
>> in openstack itself: code follows maintenance; use drives maintenance.
>>
>> Anyway, I digress
>>
>> I've thought about this some more and here are some points
>>
>> -if there's mutual code and/or tests related to flink connectors and the
>> spark ones, there's a very strong case for putting the code into bahir
>>
>
> IMHO, even if there isn't, I believe there is still benefits, some of I
> have described above.
>
>
>> -if it's more that you need a home for things, I'd recommend you start with
>> Apache Flink and if there are big contributions that suffer neglect then
>> it'll be time to look for a home
>>
>>
> Well, I would say, if you need a more flexible place to host these
> extensions, Bahir would welcome you.
>
> Having said that, we are expecting that the Flink community would be
> responsible for maintaining these extensions with help of the Bahir
> community. Note that we also have an defined some guidelines for retiring
> extensions : http://bahir.apache.org/contributing-extensions/ which will be
> used in case of orphaned code.
>
>
>> in the meantime, maybe bahir artifacts should explicitly indicate that they
>> are for spark, eg bahir-spark, so as to leave the option for having, say, a
>> bahir-flink artifact at some point in the future.
>>
>
> Currently, all artifact ids are prefixed by spark:
> <artifactId>spark-streaming-akka_2.11</artifactId>
>
>
>
>>
>>
>>
>> On 11 August 2016 at 14:42, Robert Metzger <rmetz...@apache.org> wrote:
>>
>> > @Steve: The plan is that Flink committers also help out here with
>> > reviewing, releasing and other community activities (but I suspect the
>> > activity will be much lower, otherwise, we would not be discussing
>> removing
>> > some of the connectors from Flink)
>> >
>> > @Luciano: So the idea is to have separate repositories for each project
>> > contributing connectors?
>> > I'm wondering if it makes sense to keep the code in the same repository
>> to
>> > have some synergies (like the release scripts, CI, documentation, a
>> common
>> > parent pom with rat etc.). Otherwise, it would maybe make more sense to
>> > create a Bahir-style project for Flink, to avoid maintaining completely
>> > disjunct codebases in the same JIRA, ML, ...
>> >
>> >
>> > On Thu, Aug 11, 2016 at 1:50 PM, Luciano Resende <luckbr1...@gmail.com>
>> > wrote:
>> >
>> > > On Thu, Aug 11, 2016 at 2:04 AM, Steve Loughran <ste...@apache.org>
>> > wrote:
>> > >
>> > > > I can see benefits from this —provided we get some help from the
>> Flink
>> > > > people in maintaining and testing the stuff.
>> > > >
>> > >
>> > > +1, Let me know when you guys are ready and I can create a bahir-flink
>> > git
>> > > repository.
>> > >
>> > >
>> > > --
>> > > Luciano Resende
>> > > http://twitter.com/lresende1975
>> > > http://lresende.blogspot.com/
>> > >
>> >
>>
>
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>
>
> ---------- Forwarded message ----------
> From: Luciano Resende <luckbr1...@gmail.com>
> To: dev@bahir.apache.org
> Cc:
> Date: Fri, 12 Aug 2016 11:34:25 -0700
> Subject: Re: [DISCUSS] Adding streaming connectors from Apache Flink to Bahir
> On Thu, Aug 11, 2016 at 9:03 AM, Luciano Resende <luckbr1...@gmail.com>
> wrote:
>
>>
>>
>> On Thu, Aug 11, 2016 at 5:42 AM, Robert Metzger <rmetz...@apache.org>
>> wrote:
>>
>>>
>>>
>>> @Luciano: So the idea is to have separate repositories for each project
>>> contributing connectors?
>>> I'm wondering if it makes sense to keep the code in the same repository to
>>> have some synergies (like the release scripts, CI, documentation, a common
>>> parent pom with rat etc.). Otherwise, it would maybe make more sense to
>>> create a Bahir-style project for Flink, to avoid maintaining completely
>>> disjunct codebases in the same JIRA, ML, ...
>>>
>>>
>>>
>> But we most likely would have very different release schedules with the
>> different set of extensions, where Spark extensions will tend to follow
>> Spark release cycles, and Flink release cycles. As for the overhead, I
>> believe release scripts might be the one piece that would be replicated,
>> but I can volunteer the infrastructure overhead for now. All rest, such as
>> JIRA, ML, etc will be common. But, anyway, I don't want to make this an
>> issue for Flink to bring up the extensions here, so if you have a strong
>> preference on having all in the same repo, we could start with that.
>>
>> Thoughts ?
>>
>>
> I have thought more about the question about one combined repository versus
> separate repositories per platform (e.g. Spark, Flink) and the more I think
> I believe two repositories will be the best. Think about some of the
> benefits listed below :
>
> Multiple Repositories:
> - Enable smaller and fast builds, as you don't have to wait on the other
> platform extensions
> - Simplify dependency management when different platforms use different
> levels of dependencies
> - Enable for more flexibility on releases, permitting disruptive changes in
> one platform without affecting others
> - Enable better versioning schema for different platforms (e.g. Spark
> following the Spark release version schema, while Flink having it's own
> schema)
> - etc
>
> One Repository
> - Enable sharing common components (which in my view will be mostly
> infrastructure pieces that once created are somewhat stable)
>
> Thoughts ?
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>
>
> ---------- Forwarded message ----------
> From: Robert Metzger <rmetz...@apache.org>
> To: dev@bahir.apache.org
> Cc:
> Date: Mon, 15 Aug 2016 14:04:09 +0200
> Subject: Re: [DISCUSS] Adding streaming connectors from Apache Flink to Bahir
> Hi,
>
> @stevel: Flink is still experiencing a lot of community growth. Initially,
> we accepted all contributions in an acceptable state. Then, we introduced
> various models of "staging" and "contrib" modules, but by now, the amount
> of incoming contributions is just too high for the core project.
> Also, its a bit out of scope compared to the core engine we are building.
> That's why we started looking at Bahir (and other approaches)
>
> @Luciano, I'll answer to the multiple vs one repo discussion inline below
>
>
> On Fri, Aug 12, 2016 at 8:34 PM, Luciano Resende <luckbr1...@gmail.com>
> wrote:
>
>> On Thu, Aug 11, 2016 at 9:03 AM, Luciano Resende <luckbr1...@gmail.com>
>> wrote:
>>
>> Multiple Repositories:
>> - Enable smaller and fast builds, as you don't have to wait on the other
>> platform extensions
>>
>
> True, build time is an argument for multiple repos
>
>
>> - Simplify dependency management when different platforms use different
>> levels of dependencies
>>
>
> I don't think that the dependencies influence each other much.
> For the one repository approach, the structure would probably be like this:
>
> bahir-parent
> -  bahir-spark
>      - spark-streaming-akka
>      - ...
> - bahir-flink
>     - flink-connector-redis
>     - ...
>
> In "bahir-parent", we could define all release-related plugins, apache rat,
> checkstyle?, general project information and all the other stuff that makes
> a bahir project "bahir" ;)
> In the "bahir-<system>" parent, we could define all platform specific
> dependencies and settings.
>
>
>
>> - Enable for more flexibility on releases, permitting disruptive changes in
>> one platform without affecting others
>>
>
> With the structure proposed above, I guess we could actually have an
> independent versioning / releasing for the "bahir-<system>" parent tree.
>
>
>> - Enable better versioning schema for different platforms (e.g. Spark
>> following the Spark release version schema, while Flink having it's own
>> schema)
>> - etc
>>
>> One Repository
>> - Enable sharing common components (which in my view will be mostly
>> infrastructure pieces that once created are somewhat stable)
>>
>>
>
> Since you are the project PMC chair, I propose to go for the "multiple
> repositories" approach if nobody objects within 24 hours?
>
> Once we have concluded our discussion here, I'll send a summary to the
> Flink dev@ list and see what they think about it.
> I expect them to agree to our proposals, since the "bahir approach" is our
> favorite.
>
> Regards,
> Robert
>

Re: [DISCUSS] Adding streaming connectors from Apache Flink to Bahir

Reply via email to