Re: [DISCUSS] Adding streaming connectors from Apache Flink to Bahir

Luciano Resende Fri, 12 Aug 2016 11:29:27 -0700

On Thu, Aug 11, 2016 at 2:18 PM, Steve Loughran <[email protected]> wrote:


> Thinking some more
>
> To an extent, Bahir is currently mostly a home for some connectors and
> things which were orphaned by the main spark team, giving them some ASF
> home. Luciano has been putting in lots of work getting a release out in
> sync with the spark release.
>

This was what originated Bahir, but we are already starting to see original
extensions being built by the Bahir community.
What we see today is a few distributed analytic platforms that have their
focus on build the runtime and maybe a few reference implementation
extensions, and then extensions are mostly built by individuals in their
own github repositories. Bahir enables these extensions to build a
community around it and follow the Apache governance, and it's open for non
Spark extensions.


>
> I have some plans to contribute some other things related to spark in
> there, so again, an ASF home and a test & release process (some YARN driver
> plugins, for ATS integration and another I have a plan to write for YARN
> registry binding). Again, some stuff unloved by the core spark team.
>
> Ideally, Flink should be growing its user/dev base, recruiting everyone who
> wants to get patches in and getting them to work on those JIRAs. That's the
> community growth part of an ASF project. Having some orphan stuff isn't
> ideal; it's the perennial "contrib" problem of projects.(*)
>
>
I don't think that collaborating around Flink extensions in Bahir implies
that these extensions are orphans. Bahir can give a lot of flexibility to
these extensions, one is release flexibility, where the extensions could
follow the extension source release cycle (e.g. Kafka release cycle) or the
Platform release cycle (e.g. Flink) or both, which is more complicated when
they are collocated within the Platform code. Another benefit is the share
of domain expertise, Kafka experts for example could collaborate across
extensions on different platforms, etc...


> Hadoop had a big purge of contrib stuff in the move to hadoop 2 & maven,
> though we've been adding stuff in hadoop-tools, especially related to
> object stores and things. There's now a fairly harsh-but-needed policy
> there: no contributions which can't be tested during a release. It's a PITA
> as for some code changes I need to test against: AWS S3, Azure, 2x
> OpenStack endpoints and soon a chinese one. We could have been harsh and
> said "stay on github" but having it in offers some benefits
>  -synchronized release schedule (good for Hadoop; bad if the contributors
> want to release more frequently)
>  -hadoop team gets some control over what's going on there.
>  -code review process lets us improve quality; we're getting metrics in &c.
>  -works well with my plan to have an explicit object store API, extending
> FileSystem with specific and efficient blobstore ops (put(),
> list(prefix),..)
>  -enables us to do refactorings across all object stores
>
> One thing we do have there which handles object stores/filesystems even
> outside Hadoop is a set of public compliance tests and a fairly strict
> specification of what a filesystem is meant to do; it means we can handle a
> big contrib by getting the authors to have those tests working, have
> regression tests going. But...the bindings do need active engagement to
> keep alive; openstack has suffered a bit there, and there's now some fork
> in openstack itself: code follows maintenance; use drives maintenance.
>
> Anyway, I digress
>
> I've thought about this some more and here are some points
>
> -if there's mutual code and/or tests related to flink connectors and the
> spark ones, there's a very strong case for putting the code into bahir
>

IMHO, even if there isn't, I believe there is still benefits, some of I
have described above.


> -if it's more that you need a home for things, I'd recommend you start with
> Apache Flink and if there are big contributions that suffer neglect then
> it'll be time to look for a home
>
>
Well, I would say, if you need a more flexible place to host these
extensions, Bahir would welcome you.

Having said that, we are expecting that the Flink community would be
responsible for maintaining these extensions with help of the Bahir
community. Note that we also have an defined some guidelines for retiring
extensions : http://bahir.apache.org/contributing-extensions/ which will be
used in case of orphaned code.


> in the meantime, maybe bahir artifacts should explicitly indicate that they
> are for spark, eg bahir-spark, so as to leave the option for having, say, a
> bahir-flink artifact at some point in the future.
>

Currently, all artifact ids are prefixed by spark:
<artifactId>spark-streaming-akka_2.11</artifactId>



>
>
>
> On 11 August 2016 at 14:42, Robert Metzger <[email protected]> wrote:
>
> > @Steve: The plan is that Flink committers also help out here with
> > reviewing, releasing and other community activities (but I suspect the
> > activity will be much lower, otherwise, we would not be discussing
> removing
> > some of the connectors from Flink)
> >
> > @Luciano: So the idea is to have separate repositories for each project
> > contributing connectors?
> > I'm wondering if it makes sense to keep the code in the same repository
> to
> > have some synergies (like the release scripts, CI, documentation, a
> common
> > parent pom with rat etc.). Otherwise, it would maybe make more sense to
> > create a Bahir-style project for Flink, to avoid maintaining completely
> > disjunct codebases in the same JIRA, ML, ...
> >
> >
> > On Thu, Aug 11, 2016 at 1:50 PM, Luciano Resende <[email protected]>
> > wrote:
> >
> > > On Thu, Aug 11, 2016 at 2:04 AM, Steve Loughran <[email protected]>
> > wrote:
> > >
> > > > I can see benefits from this —provided we get some help from the
> Flink
> > > > people in maintaining and testing the stuff.
> > > >
> > >
> > > +1, Let me know when you guys are ready and I can create a bahir-flink
> > git
> > > repository.
> > >
> > >
> > > --
> > > Luciano Resende
> > > http://twitter.com/lresende1975
> > > http://lresende.blogspot.com/
> > >
> >
>



-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Re: [DISCUSS] Adding streaming connectors from Apache Flink to Bahir

Reply via email to