Re: Breaking up Flume

Sean Busbey Mon, 28 Mar 2022 07:24:57 -0700

That’s a really interesting possibility.

For the 1.10 release I think we should still upgrade the Hive 1 version to the 
latest 1.y available, but I agree we’d be well served to get a handle on the 
increasing set of possible dependencies. A 2.0 release would be a great time to 
change around how deployment works so that folks don’t expect everything to 
show up in a single omnibus tarball from a single build as they do now.

There’s a lot of things to take care of making that transition less painful, so 
I’d suggest we get an overall approach described but try to address it 
incrementally so we’re not facing a very long delay for further project 
releases.

How about  something like this?

- Release 1.10.0 soon, only backward compatible releases
- Release 1.y.0 - every other month, backward compatible dependency updates and 
bug fixes       
- Release 2.0 alpha - break up project into multiple repos, establish release 
cadence(s) w/o binary artifacts
- Release 2.1 beta - have an “easy path” convenience binary
- Release 2.2 expected to be production ready

For at least those parts of the process that don’t require project svn access I 
can help with keeping regular 1.y maintenance releases going. We could decide 
ahead of time on when to stop them; e.g. 6 months after the first “production 
ready” flume 2.y release.

For the 2.y releases, I think we’re going to have some growing pains in 
managing how we get from multiple repositories to PMC blessed releases and from 
there to artifacts someone could use to run flume if they’re used to our 
current deployment model. Setting expectations via alpha/beta labels and stated 
packaging goals means we should be able to work out friction points while still 
walking before we try to run with a long term sustainable path for the project. 
We could try to put some goal dates on those milestones once we have spent some 
time discussing details and trying move things forward.

> On Mar 27, 2022, at 4:19 AM, Ralph Goers <ralph.go...@dslextreme.com> wrote:
> 
> Sean, (and everyone else)
> 
> You mentioned that you want to create separate maven modules to upgrade hive 
> & hbase.  The Flume build is already very large. In addition, Upgrading to 
> Hive 3 looks like it will require Hadoop 3 while Hive 2 runs with Hadoop 2. 
> This means both dependencies would need to be in the parent pom. I find this 
> problematic for the following reasons:
> Flume contains a ton of dependencies and even more transitive dependencies 
> that are not declared. This makes creating new releases really hard given how 
> many dependencies have to be checked and upgraded.
> As more modules are added the build is just going to get slower.
> Some modules have dependencies on things that are no longer supported. Again, 
> that makes creating a full Flume release hard.
> 
> I would suggest that unless security fixes require it we hold off on creating 
> upgrades in 1.10.0 for HBase and Hive beyond what you have already done. 
> Instead, we should create new repositories for the parts of Flume we want to 
> separate and maintain independently. The HBase and Hive upgrades would end up 
> goring there.
> 
> I believe this will speed up development since builds will no longer take so 
> long.It also means that PRs will go against the target repo which should 
> simplify things. Jira would remain the same as it is today. The component 
> would be used to identify the target repo.
> 
> I would suggest that what should remain in the main Flume build would be 
> primarily, configuration, core, node, sdk, and some of configfilters.  I 
> would expect we would have separate repos for hbase, hdfs, hive, Kafka, 
> embedded-agent, tools, and legacy to start.
> 
> Thoughts?
> 
> Ralph

Re: Breaking up Flume

Reply via email to