Re: Breaking up Flume

Ralph Goers Mon, 28 Mar 2022 09:43:26 -0700

Thanks for the reply! 

In general I agree with what you are proposing. I’d probably suggest once a 
quarter instead of every 2 months. I also wouldn’t necessarily have a release 
of every component every quarter. If there have been no changes there isn’t 
much of a point. And requiring that everything be released together doesn’t 
really help. I would suggest that Flume would have a flume-parent module that 
includes a parent pom.xml that all projects would inherit from. It would 
include a dependency management section that declares the version of 
dependencies that are used across projects. In addition we would want a 
flume-bom that contains a pom.xml that includes a dependency management section 
declaring all the versions of all components for a specific Flume quarterly 
release.


As for the versions, I am not sure why you wouldn’t just go with 2.0.0-alpha, 
2.0.0-beta or 2.0.0-beta1, 2.0.0-beta2 if you aren’t comfortable labeling them 
as GA. Once things are stable you would then release 2.0.0.

Ralph

> On Mar 28, 2022, at 7:24 AM, Sean Busbey <[email protected]> wrote:
> 
> That’s a really interesting possibility.
> 
> For the 1.10 release I think we should still upgrade the Hive 1 version to 
> the latest 1.y available, but I agree we’d be well served to get a handle on 
> the increasing set of possible dependencies. A 2.0 release would be a great 
> time to change around how deployment works so that folks don’t expect 
> everything to show up in a single omnibus tarball from a single build as they 
> do now.
> 
> There’s a lot of things to take care of making that transition less painful, 
> so I’d suggest we get an overall approach described but try to address it 
> incrementally so we’re not facing a very long delay for further project 
> releases.
> 
> How about  something like this?
> 
> - Release 1.10.0 soon, only backward compatible releases
> - Release 1.y.0 - every other month, backward compatible dependency updates 
> and bug fixes     
> - Release 2.0 alpha - break up project into multiple repos, establish release 
> cadence(s) w/o binary artifacts
> - Release 2.1 beta - have an “easy path” convenience binary
> - Release 2.2 expected to be production ready
> 
> For at least those parts of the process that don’t require project svn access 
> I can help with keeping regular 1.y maintenance releases going. We could 
> decide ahead of time on when to stop them; e.g. 6 months after the first 
> “production ready” flume 2.y release.
> 
> For the 2.y releases, I think we’re going to have some growing pains in 
> managing how we get from multiple repositories to PMC blessed releases and 
> from there to artifacts someone could use to run flume if they’re used to our 
> current deployment model. Setting expectations via alpha/beta labels and 
> stated packaging goals means we should be able to work out friction points 
> while still walking before we try to run with a long term sustainable path 
> for the project. We could try to put some goal dates on those milestones once 
> we have spent some time discussing details and trying move things forward.
> 
>> On Mar 27, 2022, at 4:19 AM, Ralph Goers <[email protected]> wrote:
>> 
>> Sean, (and everyone else)
>> 
>> You mentioned that you want to create separate maven modules to upgrade hive 
>> & hbase.  The Flume build is already very large. In addition, Upgrading to 
>> Hive 3 looks like it will require Hadoop 3 while Hive 2 runs with Hadoop 2. 
>> This means both dependencies would need to be in the parent pom. I find this 
>> problematic for the following reasons:
>> Flume contains a ton of dependencies and even more transitive dependencies 
>> that are not declared. This makes creating new releases really hard given 
>> how many dependencies have to be checked and upgraded.
>> As more modules are added the build is just going to get slower.
>> Some modules have dependencies on things that are no longer supported. 
>> Again, that makes creating a full Flume release hard.
>> 
>> I would suggest that unless security fixes require it we hold off on 
>> creating upgrades in 1.10.0 for HBase and Hive beyond what you have already 
>> done. Instead, we should create new repositories for the parts of Flume we 
>> want to separate and maintain independently. The HBase and Hive upgrades 
>> would end up goring there.
>> 
>> I believe this will speed up development since builds will no longer take so 
>> long.It also means that PRs will go against the target repo which should 
>> simplify things. Jira would remain the same as it is today. The component 
>> would be used to identify the target repo.
>> 
>> I would suggest that what should remain in the main Flume build would be 
>> primarily, configuration, core, node, sdk, and some of configfilters.  I 
>> would expect we would have separate repos for hbase, hdfs, hive, Kafka, 
>> embedded-agent, tools, and legacy to start.
>> 
>> Thoughts?
>> 
>> Ralph
> 
>

Re: Breaking up Flume

Reply via email to