Thinking some more

To an extent, Bahir is currently mostly a home for some connectors and
things which were orphaned by the main spark team, giving them some ASF
home. Luciano has been putting in lots of work getting a release out in
sync with the spark release.

I have some plans to contribute some other things related to spark in
there, so again, an ASF home and a test & release process (some YARN driver
plugins, for ATS integration and another I have a plan to write for YARN
registry binding). Again, some stuff unloved by the core spark team.

Ideally, Flink should be growing its user/dev base, recruiting everyone who
wants to get patches in and getting them to work on those JIRAs. That's the
community growth part of an ASF project. Having some orphan stuff isn't
ideal; it's the perennial "contrib" problem of projects.(*)

Hadoop had a big purge of contrib stuff in the move to hadoop 2 & maven,
though we've been adding stuff in hadoop-tools, especially related to
object stores and things. There's now a fairly harsh-but-needed policy
there: no contributions which can't be tested during a release. It's a PITA
as for some code changes I need to test against: AWS S3, Azure, 2x
OpenStack endpoints and soon a chinese one. We could have been harsh and
said "stay on github" but having it in offers some benefits
 -synchronized release schedule (good for Hadoop; bad if the contributors
want to release more frequently)
 -hadoop team gets some control over what's going on there.
 -code review process lets us improve quality; we're getting metrics in &c.
 -works well with my plan to have an explicit object store API, extending
FileSystem with specific and efficient blobstore ops (put(),
list(prefix),..)
 -enables us to do refactorings across all object stores

One thing we do have there which handles object stores/filesystems even
outside Hadoop is a set of public compliance tests and a fairly strict
specification of what a filesystem is meant to do; it means we can handle a
big contrib by getting the authors to have those tests working, have
regression tests going. But...the bindings do need active engagement to
keep alive; openstack has suffered a bit there, and there's now some fork
in openstack itself: code follows maintenance; use drives maintenance.

Anyway, I digress

I've thought about this some more and here are some points

-if there's mutual code and/or tests related to flink connectors and the
spark ones, there's a very strong case for putting the code into bahir
-if it's more that you need a home for things, I'd recommend you start with
Apache Flink and if there are big contributions that suffer neglect then
it'll be time to look for a home

in the meantime, maybe bahir artifacts should explicitly indicate that they
are for spark, eg bahir-spark, so as to leave the option for having, say, a
bahir-flink artifact at some point in the future.




On 11 August 2016 at 14:42, Robert Metzger <[email protected]> wrote:

> @Steve: The plan is that Flink committers also help out here with
> reviewing, releasing and other community activities (but I suspect the
> activity will be much lower, otherwise, we would not be discussing removing
> some of the connectors from Flink)
>
> @Luciano: So the idea is to have separate repositories for each project
> contributing connectors?
> I'm wondering if it makes sense to keep the code in the same repository to
> have some synergies (like the release scripts, CI, documentation, a common
> parent pom with rat etc.). Otherwise, it would maybe make more sense to
> create a Bahir-style project for Flink, to avoid maintaining completely
> disjunct codebases in the same JIRA, ML, ...
>
>
> On Thu, Aug 11, 2016 at 1:50 PM, Luciano Resende <[email protected]>
> wrote:
>
> > On Thu, Aug 11, 2016 at 2:04 AM, Steve Loughran <[email protected]>
> wrote:
> >
> > > I can see benefits from this —provided we get some help from the Flink
> > > people in maintaining and testing the stuff.
> > >
> >
> > +1, Let me know when you guys are ready and I can create a bahir-flink
> git
> > repository.
> >
> >
> > --
> > Luciano Resende
> > http://twitter.com/lresende1975
> > http://lresende.blogspot.com/
> >
>

Reply via email to